Jai kora

Teaching Your AI Tools Instead of Just Using Them

Jai Kora — Wed, 15 Apr 2026 13:44:43 GMT

Teaching Your AI Tools Instead of Just Using Them

Every bug fix you make without updating your AI's knowledge base is engineering work you'll do twice. While most developers treat AI as a fancy autocomplete, a growing cohort is building something different: systems that remember patterns, accumulate lessons, and get smarter with each interaction.

The shift from disposable prompting to persistent AI memory represents the difference between using tools and training colleagues. One approach makes you faster today. The other compounds your capabilities indefinitely.

The Memory Problem with Standard AI Usage

Most AI interactions follow the same wasteful pattern. You describe your problem, the AI solves it, you implement the solution, then start over tomorrow with a blank slate. Your hard-won debugging insights evaporate. Your architectural decisions become archaeological mysteries. Your code review feedback turns into repetitive theater.

This is expensive stupidity masquerading as efficiency.

Consider a typical scenario: your background job fails silently due to rate limits. You spend an hour debugging, discover the root cause, implement proper error handling and job resumption. Victory. But when similar issues surface three months later, you're back to square one because your AI assistant has no memory of that investigation.

The solution isnt using AI more. Its teaching AI better.

Building Persistent Knowledge Systems

Effective AI memory requires structured knowledge artifacts, not just chat history. The best practitioners create dedicated knowledge bases that persist across conversations and compound over time.

Start with documentation templates that capture decision patterns:

# API Design Patterns - /docs/api-patterns.md

## Authentication Failures
Pattern: Always return 401 with specific error codes
Reason: Frontend needs granular error handling
Last Updated: [Investigation that led to this rule]

## Rate Limiting Strategy  
Pattern: Exponential backoff with jitter
Implementation: Use sidekiq-cron with random delays
Context: Gmail API limitations discovered during archive feature

These artifacts live outside conversation context windows, preventing the typical AI amnesia that kicks in after lengthy debugging sessions. When your AI encounters similar problems, it references these patterns first before reinventing solutions.

The Compound Engineering Loop

True AI training happens in cycles. You encounter a problem, solve it with AI assistance, then extract the general principle for future reference. Each iteration makes the system more capable at handling your specific domain.

The most sophisticated practitioners automate this extraction. After resolving issues, they prompt their AI to identify transferable patterns and update relevant documentation automatically. A Rails upgrade investigation becomes permanent upgrade procedures. A pricing research deep dive becomes reusable frameworks for future product decisions.

This creates genuine expertise accumulation rather than repeated one-off solutions.

Implementation Architecture

Effective AI memory systems require three components:

Knowledge Artifacts: Structured markdown files capturing patterns, decisions, and procedures. Store these in version control alongside your code.

Retrieval Mechanisms: Sub-agents that surface relevant artifacts based on current context. Claude Projects and custom GPTs excel at this contextual retrieval.

Update Workflows: Systematic processes for capturing new insights and updating existing knowledge. The best implementations trigger knowledge updates immediately after problem resolution, while context remains fresh.

The key insight is treating your AI tools like junior developers who need training, not magical oracles who inherently understand your domain.

Why This Matters Now

The AI tooling landscape has matured beyond basic chat interfaces. GPTs, Claude Projects, and similar persistent workspaces finally make systematic AI training practical for individual developers and small teams.

More importantly, the companies building genuine AI-powered development velocity are those treating AI as teachable systems rather than disposable assistants. They're accumulating institutional knowledge in AI-accessible formats and seeing compound returns on that investment.

While others debate whether AI will replace developers, the smart money is on developers who can train AI to amplify their specific expertise. The future belongs to those building systems that remember, not just those writing better prompts.

Your next bug fix is an opportunity to teach, not just solve. Make it count.

Why Most AI-Assisted Development is Just Expensive Autocomplete

Jai Kora — Wed, 01 Apr 2026 12:23:21 GMT

Why Most AI-Assisted Development is Just Expensive Autocomplete

Your $240-per-year GitHub Copilot subscription is making you feel productive while keeping you trapped in the same fundamental bottlenecks that have plagued software development for decades. The industry is celebrating AI that can synthesize entire applications from screenshots, but most teams are still clicking through the same tedious verification loops, just with shinier tooling.

The dirty secret nobody wants to admit: we solved the wrong problem.

The Verification Trap

Generation speed was never the bottleneck. Any competent developer can bang out boilerplate faster than they can think through the problem space. The real constraint has always been verification: does this code actually work, integrate properly, and solve the right problem?

AI coding assistants excel at producing plausible-looking code that requires extensive human oversight. Recent studies show 40-60% of AI-generated code needs significant modification before production use. You're not saving time, you're shifting where you spend it. Instead of writing from scratch, you're now debugging someone else's confident mistakes.

The verification loop scales brutally with codebase complexity. On small projects, you can quickly validate AI output. On enterprise systems with intricate dependencies and edge cases, that validation can take hours or days. The AI cant shortcut this process because it has no intuition about what's likely to break.

The Planning Deficit

AI tools made us intellectually lazy. When autocomplete can flesh out entire functions, why bother sketching the architecture first? When models can implement features from vague descriptions, why write detailed specifications?

This backwards approach burns hours in debugging cycles that proper planning would eliminate. An experienced engineer looks at a broken system and quickly narrows possibilities based on years of failed experiments. The AI runs every experiment sequentially, burning tokens and time.

Planning isn't just about efficiency—it's about building systems that compound. Quick AI-generated solutions create technical debt. Thoughtful architecture creates foundations that accelerate future development.

The Compound Intelligence Gap

Most AI-assisted development follows a reset pattern: prompt, generate, ship, repeat. Each interaction starts from zero context. This is expensive autocomplete, not intelligent collaboration.

The companies actually seeing transformative results are building compound systems where AI learns from every bug fix, code review, and architectural decision. Instead of generating isolated code snippets, these systems accumulate context about the codebase, team preferences, and domain-specific patterns.

This requires intentional system design, not just better prompts. You need infrastructure that captures and surfaces institutional knowledge, not another chat interface that forgets everything between sessions.

The Economic Reality Check

Stack Overflow usage remains high despite over a million GitHub Copilot subscribers. Developers still need human-curated solutions for complex problems. The productivity gains from AI coding tools are real but modest—certainly not the 10x improvements breathlessly promised in vendor marketing.

Most organizations are spending thousands on AI subscriptions while their core development bottlenecks remain untouched: unclear requirements, fragmented toolchains, poor testing infrastructure, and communication overhead. These unsexy problems dont have VC-funded solutions, but solving them delivers more sustained productivity gains than any coding assistant.

What Actually Works

The teams seeing genuine acceleration from AI aren't using it as a magic code generator. They're building systems that amplify human expertise rather than replacing human judgment.

This means treating AI as one component in a larger productivity stack: better planning processes, cleaner interfaces between systems, comprehensive testing suites, and institutional knowledge capture. The AI becomes more effective when embedded in well-designed workflows, not deployed as a standalone miracle cure.

Start with the fundamentals: clear problem statements, modular architectures, and fast feedback loops. Then layer in AI assistance strategically, focusing on areas where verification costs are manageable.

The Path Forward

The current AI coding hype cycle will plateau as teams confront these verification and planning realities. The companies that emerge ahead will be those that used AI as an excuse to rebuild their development processes from first principles, not those that bolted chatbots onto legacy workflows.

Stop chasing the latest model release and start fixing your actual bottlenecks. Your future self will thank you when you're shipping faster than competitors still fighting their expensive autocomplete.

Building Compound Engineering Systems That Learn From Every Bug Fix

Jai Kora — Thu, 26 Mar 2026 12:10:01 GMT

Building Compound Engineering Systems That Learn From Every Bug Fix

Most AI coding tools give you short-term speed boosts then reset to zero knowledge. You write code, ship features, fix bugs, and next week you're back to explaining the same patterns to the same AI. This theatrical performance of productivity misses the real opportunity: building development systems with permanent memory that get smarter with every pull request.

While GitHub Copilot usage exploded 180% in 2023 and investment in AI developer tools hit $2.3B, we're still thinking about AI assistance wrong. The current model treats AI like an intern who never learns from mistakes. Compound engineering flips this script entirely.

What Compound Engineering Actually Means

Compound engineering is the practice of building AI systems that accumulate institutional knowledge rather than starting fresh every session. Instead of prompting an AI to solve today's problem, you're teaching a system that remembers solutions, patterns, and failures across every codebase interaction.

The math is brutal: developers spend 25-50% of their time debugging according to Stack Overflow's 2023 survey. Most of these bugs fall into predictable categories that compound systems would catch automatically after seeing them once. You're literally paying the same debugging tax repeatedly because your AI tools have amnesia.

Every bug fix in a compound system generates three artifacts: the immediate solution, a pattern recognition rule to prevent similar issues, and context about why this failure mode exists. Traditional AI gives you only the first.

The Architecture of Memory

The technical implementation centers on extractable artifacts that persist between sessions. These aren't bloated context windows stuffed with every previous conversation. Instead, you're building a retrieval system where sub-agents pull relevant historical knowledge only when needed.

Here's the critical insight: context compaction kills compound learning. You need to extract lessons while the full conversation history is still accessible, before the AI starts forgetting earlier interactions. This means triggering the compound step manually when context is fresh, not after multiple back-and-forth exchanges.

The artifact structure looks like this:

Pattern Rules: Specific coding patterns that caused issues, with prevention logic
Style Preferences: Architectural decisions and naming conventions that proved effective
Domain Knowledge: Business logic rules and edge cases specific to your codebase
Error Categories: Classification of bug types with automatic detection triggers

These artifacts live as structured files, not conversation history. When you start a new coding session, the AI queries this knowledge base for relevant patterns rather than starting from zero.

Implementation Strategy

Start with bug categorization. Every time you fix a bug, force yourself to answer: "What category of error was this, and how could the system detect similar issues automatically?" The key is building detection rules that generalize beyond the specific instance.

For code reviews, extract two types of knowledge: style preferences that should become defaults, and architectural patterns that worked well for similar problems. The goal isnt just documenting decisions but encoding them into the AI's future behavior.

The compound step itself is straightforward. After completing any substantial development work, run an extraction prompt that identifies:

Recurring patterns in the code changes
Error types that appeared multiple times
Style decisions that proved effective
Domain-specific logic that should inform future work

Store these as structured artifacts, not natural language summaries. The AI needs to query and apply this knowledge programmatically, not just reference it conversationally.

Beyond Individual Productivity

The real leverage comes from team-level compound systems. When your entire engineering organization feeds lessons into shared knowledge artifacts, you're building institutional memory that survives employee turnover and scales across codebases.

Single developers using compound engineering report productivity gains equivalent to 5x traditional development. The system learns your preferences, remembers your mistakes, and applies hard-won architectural decisions automatically.

But the deeper shift is philosophical: you stop thinking about today's code and start thinking about tomorrow's system. Every bug fix becomes incomplete until it teaches the AI to prevent similar failures. Code reviews feel wasteful unless they extract reusable patterns.

The Compound Advantage

Traditional AI engineering optimizes for today's velocity. Compound engineering optimizes for tomorrow's capability. Three months of compound engineering fundamentally changes how you approach development: the system gets smarter while your codebase gets more complex, creating positive rather than negative scaling.

The future of development isn't just faster coding. It's self-improving systems that learn from every mistake, remember every preference, and apply accumulated wisdom automatically. Most teams are still treating AI like expensive autocomplete. The teams building compound systems are building something closer to institutional intelligence.

Start extracting lessons from your next bug fix. Your future self will thank you.

Building Compound Engineering Systems That Learn From Every Pull Request

Jai Kora — Tue, 24 Mar 2026 13:11:15 GMT

Building Compound Engineering Systems That Learn From Every Pull Request

Most AI coding assistants help you ship faster today but leave you starting from scratch tomorrow. Here's how to build development systems that accumulate knowledge and make every subsequent feature easier to implement.

The current AI development paradigm is fundamentally broken. You prompt Claude or Copilot, it spits out code, you ship it, then next sprint you start fresh. Your AI has no memory of your architecture decisions, your bug patterns, or why you chose React over Vue six months ago. You're paying for intelligence that forgets everything the moment you close your terminal.

This is the difference between AI engineering and compound engineering. AI engineering makes you faster today. Compound engineering makes you faster tomorrow, and exponentially faster next month.

The Memory Problem

Traditional AI tools operate in a vacuum. Every interaction is context-free, every code generation starts from first principles. You waste cycles explaining your tech stack, your coding conventions, your business logic constraints. The AI generates perfectly functional code that completely ignores the patterns you've spent years establishing.

Meanwhile, the best human engineers accumulate institutional knowledge. They remember why certain approaches failed, which libraries caused deployment headaches, how your team prefers to structure error handling. This knowledge compounds across projects.

Compound engineering systems replicate this behavior algorithmically.

Building Systems That Learn

1. Pull Request Pattern Recognition

Start by instrumenting your PR process to capture decision patterns. Every code review contains training data your AI should internalize:

Which variable naming conventions get approved versus rejected
How your team structures error handling across different service layers
What testing patterns consistently pass review
Which architectural decisions get flagged during review

Store this feedback in a structured format your AI can reference. When generating new code, it should default to patterns that previously passed review rather than generic best practices.

2. Bug Category Prevention

Every bug fix should prevent its entire category going forward. When you patch a null pointer exception, your system should recognize the pattern and flag similar code structures before they hit production.

This requires moving beyond reactive debugging to proactive pattern prevention. Your AI should scan new code against historical bug patterns and surface potential issues during development, not after deployment.

3. Architecture Context Accumulation

Your AI should understand why you made specific technology choices. When you chose PostgreSQL over MongoDB, when you decided against microservices, when you picked Next.js over pure React. These decisions have downstream implications for every subsequent feature.

Maintain a living architectural decision record that your AI references when generating code. New features should align with established patterns rather than introducing architectural drift.

Implementation Strategy

Phase 1: Data Collection

Instrument your development workflow to capture decision-making data:

PR comments and approval patterns
Code review feedback loops
Bug report classifications
Performance optimization choices

This data becomes your AI's training corpus. The goal isnt perfect code generation but consistent code generation that aligns with your team's established patterns.

Phase 2: Context Integration

Build AI workflows that reference accumulated context:

Before generating new code, query historical patterns for similar functionality
Surface relevant architectural decisions when starting new features
Flag potential issues based on previous bug categories
Suggest testing approaches that align with your established coverage patterns

Phase 3: Feedback Loops

Create systems where every development decision updates the knowledge base:

Failed deployments update deployment patterns
Performance issues update optimization guidelines
Security reviews update security protocols
User feedback updates feature prioritization logic

The Compound Effect

After three months of compound engineering, you stop writing code and start teaching systems. Every bug fix becomes a permanent lesson. Every code review updates the defaults. Every architectural decision prevents future inconsistency.

The complexity of your codebase still grows, but now your AI's knowledge grows alongside it. What once required extensive context-setting now happens automatically. Your development velocity doesn't just increase—it accelerates.

Beyond Individual Productivity

Compound engineering systems scale beyond single developers. When your entire team contributes to the same learning system, junior developers inherit senior-level decision-making patterns. New hires onboard faster because the AI already knows your conventions.

This approach transforms AI from a productivity tool into a knowledge multiplier. Instead of making individuals faster, it makes entire organizations smarter.

Most teams are still stuck in the prompt-code-ship cycle, treating AI like a sophisticated autocomplete. Build systems that learn from every decision, and you'll be shipping like a team of five while your competitors are still explaining their tech stack to ChatGPT.

Getting Started With AI Is Easy. Building Your Workflow Is the Real Work.

Jai Kora — Fri, 20 Mar 2026 22:45:42 GMT

Getting Started With AI Is Easy. Building Your Workflow Is the Real Work.

Everyone is showing you their AI workflow. Nobody tells you it took months of failed experiments to get there.

The honeymoon phase is officially over. METR's 2025 study revealed a brutal truth: developers think AI makes them 24% faster, but they're actually 19% slower. That's a 43-point perception gap between reality and marketing hype. Stack Overflow's developer survey shows trust in AI tools plummeting from 40% to 29%. The almost-right output that requires more debugging than writing from scratch has become the developer's new nightmare.

Yet some teams are shipping like they've multiplied their engineering capacity by five. The difference isnt the tools. It's the system.

The Cargo Cult Problem

Most AI adoption follows the same doomed pattern: see a demo, copy the surface behaviors, wonder why it doesnt work. You watch someone effortlessly prompt Claude to refactor their entire component system, so you fire up Cursor and start typing. Three hours later, you're debugging AI-generated code that almost works but breaks in subtle ways that take longer to fix than writing it yourself.

This is cargo cult engineering. You've replicated the visible actions without understanding the invisible infrastructure that makes them effective.

The real workflow happens before you touch the AI. It's research patterns, context preparation, and systematic iteration. The successful AI engineers I know spend 15-20 minutes researching before writing their first prompt. They're searching their existing codebase for similar patterns, reading documentation, and synthesizing approaches. The AI becomes the implementation engine for a plan they've already validated.

From Programmer to Orchestra Conductor

The mindshift is profound. Individual contributors optimize for getting things done. AI-augmented developers optimize for getting the right things done in parallel. Instead of writing one function at a time, you're managing multiple Claude Code tabs working on different features through separate git worktrees. Your monitor looks like mission control because you're no longer coding, you're conducting.

This requires unlearning how you approach problems. Traditional development is sequential: understand, plan, implement, test, debug. AI development is parallel: understand everything at once, plan multiple approaches simultaneously, implement through delegation, orchestrate the integration.

The developers who've made this transition successfully report shipping 5x more code. Not because they type faster, but because they think differently about scope and delegation.

The Compounding Effect

The real power emerges when you stop thinking about AI as a productivity tool and start treating it as a learning system. Every bug fix teaches the system. Every code review updates the defaults. Every pull request becomes institutional knowledge.

This is the difference between AI engineering and compound engineering. AI engineering makes you faster today. Compound engineering makes you faster tomorrow and every day after.

Building this requires infrastructure most developers skip: documented patterns, extractable lessons, systematic capture of what works. When you hit a Gmail rate limiting issue while building an email cleanup feature, that becomes a permanent lesson that prevents the entire category of problems going forward.

The teams shipping like expanded organizations have built systems with memory. Their AIs dont just complete tasks, they accumulate expertise.

Why Most Workflows Fail

DORA's 2025 report shows individual task completion up 21% but organizational delivery metrics flat. PR review times are up 91%. This tells the whole story: people are generating more code, but they're not shipping better software faster.

The problem is treating AI like a faster keyboard instead of a different kind of teammate. You wouldn't hand a junior developer a vague prompt and expect production-ready code. But that's exactly how most people use Claude or Cursor.

Successful AI workflows have structure: clear problem definition, researched approaches, explicit success criteria, systematic feedback loops. They treat the AI like a very capable intern who needs good direction and honest feedback.

The Real Work

Getting started with AI coding tools takes an afternoon. Building a workflow that actually makes you faster takes months of deliberate practice and systematic iteration.

You need to develop new instincts: when to delegate versus when to direct, how to structure context for parallel processing, what kinds of problems benefit from AI augmentation versus traditional approaches.

The developers who've cracked this code dont just prompt better. They've rebuilt their entire approach to software development around systematic leverage. They think in terms of compound learning, parallel execution, and systematic capture of institutional knowledge.

That's not a workflow you copy from a blog post. It's a capability you build through months of thoughtful experimentation.

Stop looking for the perfect AI workflow to copy. Start building your own systematic approach to compound engineering. The tools are commodities. The system is your competitive advantage.

Is Code Review built for AI era ?

Jai Kora — Mon, 16 Mar 2026 06:47:07 GMT

Why Your Code Review Process Cant Keep Up with AI-Assisted Development

When your AI can generate entire features in minutes but verification takes days, you've inverted the entire software development equation. The bottleneck has shifted from writing code to proving it works, and most teams are handling this transition catastrophically badly.

The traditional code review process was designed for a world where writing code was expensive. That world is gone. Claude can refactor your authentication system while you grab coffee, and GitHub Copilot scaffolds complete APIs faster than you can type the function signature.

Yet teams are still running code review like it's 2019, creating verification bottlenecks that make AI-accelerated development feel like driving a Ferrari in rush hour traffic.

The Great Inversion

For decades, the equation was simple: writing code was hard, reviewing it was easy. You'd spend hours crafting implementations, then a colleague would spend 15 minutes scanning for bugs and style violations.

Now? Your AI just generated 500 lines of production-ready code in three minutes. It handles edge cases you forgot, follows your coding standards, and includes comprehensive error handling. But your code review process still assumes those 500 lines represent days of human thought requiring equally careful human validation.

This is process debt: applying legacy verification to AI-generated code.

Consider what happens when AI generates a complex feature:

Writes comprehensive tests covering edge cases humans miss
Follows established patterns more consistently than tired developers
Generates matching documentation
Handles error cases with mechanical precision

But human reviewers still hunt for bugs AI doesn't create while missing the bugs AI does create.

What AI Gets Wrong (And What It Gets Right)

Here's the uncomfortable truth: AI is better than most developers at writing boring, correct code. It doesn't get distracted, cut corners when tired, or introduce bugs while thinking about weekend plans.

But AI fails predictably in ways traditional code review completely misses:

Context Blindness: AI writes perfect code solving the wrong problem. It implements flawless caching when you needed to fix a database query.

Integration Ignorance: AI excels at isolated problems but creates system-wide bottlenecks.

Requirements Drift: AI implements exactly what you asked for, which is rarely what you need.

Traditional code review catches none of these because it focuses on implementation quality, not problem alignment.

The New Verification Framework

Smart teams aren't abandoning verification—they're evolving it. Here's what works:

Layer 1: Automated Verification

If AI can write code, AI can verify most of it:

Enhanced static analysis checking architectural patterns beyond syntax
Automated security scanning understanding AI-specific vulnerabilities
Integration testing validating system-wide behavior
Performance regression testing catching subtle AI inefficiencies

Layer 2: Intent Verification

Human reviewers ask different questions:

Does this solve the actual problem?
Will this create issues for other teams?
Does this align with our architectural direction?
Are we building the right thing?

This requires business context understanding, not codebase knowledge.

Layer 3: Contextual Integration

Humans verify integration points:

API contract compatibility
Data flow implications
Operational impact
Team coordination needs

The Teams Getting It Right

Productive AI-assisted teams treat AI-generated code like output from a brilliant but junior developer: technically proficient but potentially missing context.

They focus reviews on architectural alignment, business logic validation, and system integration rather than syntax checking. They generate comprehensive test suites alongside implementation, using tests as specifications.

The Death Spiral of Traditional Review

Teams clinging to legacy processes create death spirals:

AI generates code faster than humans can review
Review queues grow, slowing delivery
Pressure mounts to rubber-stamp reviews
Quality suffers, reinforcing distrust of AI code
Process becomes more rigid and slower

This trains developers to distrust AI-generated code while preventing them from developing skills to work effectively with AI.

What Dies, What Lives

Dying:

Manual syntax checking
Human bug hunting for logic errors
Style guide enforcement
Boilerplate validation

Evolving:

Architecture alignment verification
Business logic validation
Integration impact assessment
Context and requirements verification

The Way Forward

Stop checking what machines check better. Start checking what machines can't check. Focus human attention on problem alignment, business logic correctness, and system integration.

Assume AI implementation is technically correct and focus on whether it's strategically right. Train reviewers in AI failure modes, not human failure modes.

The teams that figure this out first will ship faster with higher quality. The teams that don't will find themselves unable to compete with AI-accelerated development cycles.

Code review isn't disappearing—it's evolving into something more strategic. But only if we kill the parts that no longer serve us.

Your Agents Are Only as Good as What You Tell Them to Care About

Jai Kora — Sat, 28 Feb 2026 10:20:48 GMT

Most teams deploying AI agents are copying someone else's homework.

They find a promising setup from a blog post, a DHH tweet, or an Every.to breakdown. They import the prompts, the tools, the workflow structure. Two weeks later they're spending half their day editing outputs and wondering why the agents aren't performing like the case study promised.

I've done this. I watched a consulting client do this. The problem isn't the agents. It's that the specification doesn't belong to you.

The "do-it-all" trap

Agents are generalists by default. Left without tight constraints, a coding agent will handle research, write documentation, make architectural decisions, and suggest refactors you didn't ask for. A project management agent will draft communications, flag risks, update statuses, and send summaries nobody reads.

This isn't a bug. It's how they're trained. The model wants to be helpful across every dimension it can reach.

With a human team member, you take certain things for granted. If you hire someone with strong communication instincts, you don't need to specify "make sure your stakeholder updates are clear." They read the room. They adapt. Years of social context fill in the gaps you never bother to articulate.

Agents don't have that social context. Every assumption you leave unspecified gets filled with a default, and the default is "try everything."

The result is agents that are technically capable but organisationally noisy. You end up reviewing and correcting output that was never in scope in the first place.

The real constraint has shifted

The two-pizza rule solved a real problem: communication overhead. Small teams mean fewer coordination channels, fewer meetings, less friction as decisions move through layers. Amazon built a successful engineering culture on it for 24 years.

But that rule assumed humans were the scarce resource and communication was the bottleneck.

Neither of those assumptions holds anymore.

At Syncio, we're a team of five with agents running across client work, internal tooling, and product development. The constraint isn't headcount. It isn't even compute. It's precision of instruction. The clearer we are about what each agent is responsible for and, critically, what it is not responsible for, the better everything runs.

This is the new leverage point. Not "how many agents do we have" but "how precisely have we defined what each one cares about."

What specification actually looks like

When I built the multi-agent architecture for BlogBuddy, we have four agents running in sequence: an OnboardingAgent, a CalendarAgent, a WriterAgent, and an EditorAgent. Early in the build, the WriterAgent was doing too much. It was making editorial judgements that belonged to the EditorAgent, suggesting scheduling changes that belonged to the CalendarAgent, and generally overreaching into adjacent territory because nothing told it not to.

The fix wasn't a better model. It was tighter scope.

We rewrote the WriterAgent's context to be explicit: your job is to produce a draft matching the brief. You do not evaluate the brief. You do not suggest publishing windows. You do not rewrite the hook unless the EditorAgent sends it back. That's it.

Output quality improved immediately. Not because the model got smarter, but because we stopped asking it to be a generalist.

Think of it like an ERP system. When I built custom ERP at Nebraska, every module had a defined boundary. The inventory module didn't make purchasing decisions. The purchasing module didn't update financial records directly. Those boundaries weren't limitations. They were what made the system reliable at scale. You could trust the output of each module precisely because it wasn't trying to do everything.

Agent specification works the same way. The constraint is the feature.

Why templates from other companies fail

This is the mistake I've seen most often with clients.

A team finds a well-documented agent setup, something from Every.to or a popular GitHub repo, and treats it as a starting point. The prompts look reasonable. The tools make sense. So they drop it in and expect similar results.

It doesn't work. Not because the template is bad. Because the template encodes someone else's priorities, someone else's team skills, and someone else's context.

Every.to's compound engineering workflow is excellent for Every.to. They have two engineers who think in a very specific way about planning, Github issues, and code review loops. The workflow is built around their cognitive patterns and their codebase history. It compresses knowledge that took months to develop.

When you copy it, you get the structure without the knowledge. You get the skeleton without the muscle memory.

I spent hours on one consulting engagement editing imported agent skills before I realised I'd have been better off starting from scratch. The edits weren't improvements. They were translations. I was trying to convert someone else's context into mine, and that's harder than building fresh.

What works is starting from your team's actual working patterns. What do you currently do well? What do people on your team instinctively get right without thinking? Those are the things you don't need to specify. What do people get wrong, or hand off awkwardly, or lose track of? Those are exactly the areas where agent specification adds value, because you're encoding the discipline that the team currently lacks.

Agents amplify what you give them. Start with your own ground.

A practical starting point

If you're building your first agent team, or rebuilding a setup that isn't working, I'd suggest this sequence.

Write down what each agent is responsible for in one sentence. If you can't do it in one sentence, the scope is probably too wide. Then write what each agent is explicitly not responsible for. That second list is where most people skip. It's also where most of the noise comes from.

Finally, run the agent on real work for a week and track every time you edit its output. Most edits will cluster around a few categories. Those categories tell you exactly where your specification is loose. Tighten those boundaries, not the prompts.

The productivity gains in AI-first teams are real. But they come from the clarity of your thinking, not the sophistication of the models. The teams that win aren't the ones with the most agents. They're the ones who know precisely what each agent is there to do.

We need to reimagine 2 pizza rule for AI era

Jai Kora — Sat, 28 Feb 2026 09:54:12 GMT

The two-pizza rule was never really about pizza. It was about communication overhead. Jeff Bezos looked at large teams and saw most of the cost sitting in coordination, alignment, and the slow diffusion of decisions through too many people. Keep the team small enough to feed with two pizzas, and you keep that overhead manageable.

That logic held for twenty years. It does not hold anymore.

When agents are doing real work on your team, the communication model changes completely. Agents do not have hallway conversations. They do not build shared context over time. They do not pick up on tone or infer priorities from body language. Every interaction with an agent is a cold start, and the quality of what comes out is entirely determined by the quality of what goes in.

That shifts the constraint. It is not headcount. It is the clarity of what you ask each agent to do.

What the Two-Pizza Rule Was Actually Solving

Dan Shipper put it well in Every.to recently: Amazon's small team structure was a solution to communication debt. The more people in a team, the more relational links, the more meetings, the more diluted the ownership. Small teams moved faster because they spent less time on coordination.

Agents do not add to that coordination overhead in the same way. A team of two people with six agents is not an eight-person team for communication purposes. The agents do not send Slack messages. They do not need to be aligned on quarterly goals. They execute the task in front of them.

So if you apply the two-pizza rule to agent-augmented teams, you end up optimising for the wrong thing entirely.

The Real New Constraint: Decision Surface

Here is what I have learned running agent systems in production.

With human teams, a capable generalist is a gift. Someone who is strong at communication, good at research, reliable on detail work, you lean on them for everything and it works. You take their adaptability for granted.

With agents, you cannot do that. Agents are technically generalists but they are not reliably generalists. A capable agent given a vague brief will produce a plausible-looking result that may or may not be what you needed. The generalist capability is there, but you are the one who has to activate it correctly for each specific task.

I saw this clearly on a consulting engagement. I helped a team automate workflows by adapting templates from similar companies. The templates looked reasonable on paper. In practice, they delivered mediocre results and I spent most of the time editing outputs rather than using them. Eventually I realised the problem: the templates were not built for this team's actual priorities, this team's specific skill gaps, this team's judgement about what mattered. Starting from scratch, built around their context, produced significantly better results for less effort.

The same principle applies to agents on a product team. The question is not how many agents you have. It is how precisely you have defined what each one is responsible for and what good output looks like.

Optimise for Decision Surface, Not Headcount

The new heuristic I use: optimise for decision surface.

For each agent, there should be one clear decision or output it owns. Not a role. Not a broad function. A specific decision or deliverable, with explicit criteria for what success looks like. The tighter that surface, the more reliable the agent.

This is a bigger mental shift than it sounds. With humans, you hire people with broad capabilities and trust them to apply judgment to novel situations. You invest in their context over time and that investment compounds. With agents, you do not get that compounding. You get the capability fresh each time, and your job is to make the task surface clear enough that the capability gets applied correctly.

That does not mean you need a separate agent for every tiny task. It means every agent in your system should have a brief you could hand to a new contractor and have them immediately understand what good looks like. If you cannot write that brief, the agent is going to underperform regardless of the model.

No Template Survives Contact With Your Team

The other thing the two-pizza rule gave people was a template. A tidy number. Fewer than ten people, small enough for two pizzas. Simple to apply.

With agent teams, there is no equivalent template that transfers. I have seen people copy agent architectures from other companies wholesale and wonder why the results are inconsistent. The architecture looks right. The agents look capable. But the decision surfaces are mapped to someone else's priorities, someone else's quality standards, someone else's judgment about what matters.

Right now there is no established rule for how to structure an agent team. Anyone who tells you there is a formula is selling you something. What exists is a set of principles: keep decision surfaces tight, define output criteria explicitly, do not template from someone else's context, and build incrementally so you learn what actually needs an agent versus what is simpler without one.

At a 5-person AI-first company running production agent systems, the question I ask before adding any agent is not whether an agent could do this. It is whether I can write a brief clear enough that the agent will do this reliably. If the answer is no, the problem is not the agent. It is that we have not done the thinking yet.

What This Means for How You Structure

The two-pizza rule forced a useful discipline: keep teams small so you are forced to prioritise. The equivalent discipline in an agent-augmented team is different. It is: keep decision surfaces tight so you are forced to be specific about what matters.

That is harder than it sounds. It requires you to do the thinking upfront that human teams often do implicitly. It requires you to write down what good output looks like, what the agent should and should not consider, what a failed result looks like so you know when to intervene.

It is more work at the start. It produces dramatically better results throughout.

The teams that will figure this out are not the ones that add the most agents. They are the ones that are most disciplined about defining what each agent is actually supposed to decide.

Agents Don't Need MCP to Figure Things Out. But They Do Need Something Better.

Jai Kora — Thu, 26 Feb 2026 23:04:14 GMT

Give a capable agent enough time and tokens, and it will work out most things on its own.

I have watched this in production work. An agent navigating an API it was never explicitly told about, piecing together the right endpoints from documentation, making it work. No MCP. No custom tooling. Just the model, some context, and enough runway.

That is not a reason to abandon structured agent tooling. It is a reason to get clearer about what that tooling is actually for.

The Binary Debate Is the Wrong Debate

The MCP conversation right now is too binary. Either you are building MCP servers for everything, or you are on the "MCPs were a mistake, bash is better" side. Both camps are missing the point.

DHH ran a clean experiment in February that illustrates where this is heading. He gave an agent a computer, no MCPs, no APIs configured, and a single task. The agent got its own email address, completed a signup flow, sourced images, and joined a Basecamp account from an email invite. No special equipment. Just the model navigating the open web.

His read: agents are heading toward a world where they do not need custom integration layers to interact with their environment.

I think he is right about the direction. I do not think it means MCP is dead.

Where MCP Still Earns Its Keep

For complex operations on internal systems, structured tooling still makes sense. When an agent needs to write to a database, modify records, or execute a transactional workflow where partial failure is a real problem, I want a clean, controlled interface between the agent and the system.

The question is not "should I build an MCP server." It is "what is the cost of the agent figuring this out alone versus having a defined interface, and what happens if it gets it wrong."

Low stakes, read-only, web-native: let the agent navigate. High stakes, write operations, internal systems: wrap it properly. That logic is not new. It is the same abstraction decision we made when designing APIs for human developers.

The Real Constraint Is Data Efficiency

Here is the part that does not get enough attention.

The 2025 State of AI Agents report found that 46% of teams cite integration with existing systems as their primary challenge. That number will come down as models improve at self-directed navigation. But the deeper constraint will remain: how many tokens does it take to get from a task to a correct result.

Every layer of tooling is a tax on that. Large tool definitions consume context. Poorly designed interfaces force extra calls to clarify what the agent should already know. I have seen agent runs where 40% of the token spend was the model navigating its own tooling rather than doing the actual work. That is a badly designed interface, not a smart agent.

The agents that perform best in production are not the ones with the most tools. They are the ones where every interface is designed around what the agent actually needs to know, in the format it most naturally reasons about.

Most MCP servers today are not that. They are API wrappers with JSON schemas bolted on, built for human-readable documentation that got reformatted for tool calls.

Something Better Is Coming

MCP is a waypoint, not a destination. The pattern we are converging on is some kind of interface standard designed from the ground up for how models process information, not how REST APIs were designed for developers.

What that looks like exactly, nobody knows yet. But the direction is clear: agent tooling will evolve toward making the right data available in the most token-efficient, latency-minimal way possible. APIs and tooling should accelerate agents, not become another layer they have to work around.

The ones that are not accelerants will get replaced. Not by agents brute-forcing the open web, but by better-designed interfaces that treat the model as the primary consumer.

Build for that.