The instinct makes sense. You're running three or four parallel Claude Code agents and the throughput is noticeably higher than sequential work. So you push further — to eight, to twelve, to twenty. If a few agents are good, more must be better.
What people find instead is that the gains flatten and then reverse. Not just "more overhead" — the failure mode at 20 agents is qualitatively different from the failure mode at 5. New things break that didn't break before, and they break in ways that aren't immediately obvious.
This is a breakdown of what those failure modes actually are, because naming them clearly is the prerequisite for avoiding them.
the failure modes that appear at scale
1. API throughput becomes the bottleneck
Five agents hitting the Anthropic API simultaneously is fine. Twenty agents, each running long-context tasks, can saturate your per-minute token limits — especially during the heavy context-loading phases at the start of each session. When that happens, agents stall, you get rate-limit errors mid-task, and the "parallel" run becomes a staggered one you didn't plan for.
This is fixable with backoff logic or scheduling agents to start in waves, but it's work that has nothing to do with software development, and it now falls on you.
2. machine resources cap out before you expect
Each Claude Code agent is a PTY session running a full claude process. On a well-specced MacBook Pro, three to five is comfortable. Ten starts to feel it. Twenty is a memory and CPU problem — you're running what amounts to twenty active terminal sessions plus whatever those sessions are reading and writing. The machine slows down, context-switching becomes sluggish, and the environment itself becomes a source of noise.
If you're not watching system resources while running at high agent counts, you're probably already past the point where the host is hurting the runs.
3. task decomposition becomes intractable
Decomposing a feature into five truly independent tasks is hard. Decomposing it into twenty is harder by a factor that's not linear. The dependencies between tasks multiply; the interfaces that tasks need to share get finer-grained; the risk of two agents working on the same module in different directions grows with each additional task you carve out.
Most people who try to run twenty agents simultaneously don't actually decompose correctly for twenty tasks. They decompose for five and hand the rest under-specified work. Those agents then make assumptions — about the interface shape, about the conventions, about what the other agents are doing — and those assumptions collide at merge time.
4. supervision collapses
At three to five agents, you can maintain a working mental model of what each one is doing. You check in periodically, catch the drift when it starts, redirect before it compounds. The supervision is real, even if it's imperfect.
At twenty agents, this breaks. You don't know what most of them are doing. Errors propagate for a long time before you see them because your attention is too distributed to catch them early. An agent that's been building against a stale assumption for 30 minutes has produced a lot of output before you notice, and all of that output needs to be re-evaluated or discarded.
the typical twenty-agent session: you're watching two or three closely, dimly aware of the rest. the ones you're not watching are making decisions. some of those decisions are wrong, and you'll find out when you try to merge.
5. context cascade failures multiply
Each agent's context window is per-session — it holds what that agent has read and done, nothing from the other agents running in parallel. When you're running five agents and you make a decision that affects all of them, you can re-brief all five in a few minutes. When you're running twenty, re-briefing twenty agents is a significant chunk of time, and if you don't do it completely you have a mixed state: some agents operating on the new decision, some on the old one, and you don't know which is which.
We wrote about this in detail in keeping context and decisions consistent across parallel AI agents — the short version is that the problem isn't solvable just by adding more agents. The context coherence requirement grows linearly with agent count, and the cost of maintaining it falls entirely on you.
6. the git merge ceremony becomes a project of its own
Worktrees give each agent an isolated environment. That's correct — without them you'd have file conflicts immediately. But isolation defers the merge, it doesn't eliminate it. Twenty worktrees means twenty branches with diverging state, some of which will have touched the same files from different angles, and all of which need to be sequenced back into a coherent main branch.
Merging twenty branches isn't twenty times as hard as merging one branch. It's harder than that, because the interactions between branches accumulate. If you've run twenty agents on a feature for an afternoon, you may spend the next morning just doing the merge work — and the cognitive load of holding twenty changesets in your head simultaneously while resolving conflicts is substantial.
the diagnostic question
Before pushing to higher agent counts, one question cuts through the intuition:
Of your N agents, how many are actually making progress versus waiting on you?
If three of your five agents are blocked waiting for your review, your re-brief, or your arbitration of a conflict — running five agents is providing the throughput of two. Adding more agents doesn't fix this. It adds more agents waiting in the queue.
The throughput ceiling in most parallel-agent setups isn't the number of agents. It's the rate at which one human can review output, make decisions, and maintain shared context. Adding agents past that ceiling produces waiting agents and deferred work, not faster delivery.
a practical ceiling: most engineers who run parallel Claude Code agents seriously land at 3–5 as the stable operating range. not because more agents can't produce output, but because more agents can't be supervised effectively by one person. the number that feels chaotic is usually a signal that the human bottleneck has been exceeded, not that the agents are broken.
what to do instead
fewer, better-structured agents
The return on running more agents diminishes well before 20. The return on running fewer agents more carefully is higher than it looks. Three agents with clear, complete task briefs, well-separated concerns, and one human who can actually supervise all three will routinely outperform ten agents with overlapping work and a supervision gap.
The question is not "how many agents can I spawn?" It's "what's the maximum number I can supervise well?" For most people working alone on a codebase, that's somewhere between three and six.
lock the design before you fan out
The biggest cause of agent conflicts at scale is premature decomposition — breaking work into parallel tasks before the interface decisions are settled. An agent that starts building the data layer before the API contract is fixed will produce work that needs to be re-done the moment the contract is resolved.
The pattern that works: resolve the design and the key interface decisions non-agentic, or with one dedicated planning agent, before any implementation agent starts. Decompose into parallel tasks against a fixed specification, not a live one. The upfront time pays back multiple times in avoided merge conflicts and re-work. (This is what a structured pipeline enforces by construction.)
automated review before it reaches you
If you're reviewing all agent output yourself, you're a single-threaded bottleneck in what's supposed to be a parallel system. Some of that review can be done by agents: type checks, test runs, lint, security scan, architecture review against the specification. These can run automatically when an agent finishes, before the output reaches your queue.
What remains for you to review is higher-signal: intent alignment, design judgment, the things that require your context about the product. Everything mechanical is pre-filtered. The queue you're looking at is shorter and more meaningful.
shared context, not parallel re-briefing
Every decision you make in a parallel-agent session that needs to reach multiple agents requires either you re-briefing each one (which doesn't scale) or a shared source that agents read from. CLAUDE.md handles the static layer — conventions, architecture, the things set before you start. The dynamic layer — decisions made mid-session, current interface shape, what other agents are doing — needs somewhere else to live.
Without it, you're the only mechanism for propagating mid-session decisions to all agents, and that breaks at agent counts above five or six.
the structure question
The real leverage isn't in tuning the agent count. It's in the structure that surrounds the agents — the ordering of work, the gates between stages, the shared state, the automated checks. That structure is what determines whether agent count is a dial you can turn up or a threshold you can't cross without chaos.
This is what defract is built around: a pipeline (scope → design → architecture → implementation → review → release) that locks the design before implementation fans out, deploys review agents before the output reaches you, and maintains shared memory of decisions across all agents. The parallel implementation agents still run in worktrees; the pipeline gives them a fixed surface to build against rather than a moving one.
The goal isn't to eliminate parallelism. It's to run parallelism where it's actually safe — at the implementation stage, against a locked design — and not before.
See also: keeping context and decisions consistent across parallel AI agents and the cognitive load of running parallel Claude Code agents for the adjacent failure modes.
defract is in open beta
a structured pipeline for your parallel Claude Code agents: design locked before implementation fans out, review agents before the output reaches you, shared memory across sessions. free, no caps, no signup.