How AI agents onboard to a legacy codebase you've never touched

Here is a case worth taking apart. A developer with no recent PHP or Laravel experience cloned an eight-year-old open-source project, pointed Claude Code at the open issues, and closed roughly ten bugs in under an hour. The speed is the part people repeat, and it's the least interesting part. The interesting part is the workflow that made it possible — because the workflow is repeatable, and the speed is not a guarantee.

Onboarding to an unfamiliar codebase is a context problem before it's a coding problem. A human engineer spends the first week building a mental model: where things live, how data moves, what the unwritten rules are. An agent has no week. It has a context window and whatever you put in it. The job is to front-load that model deliberately instead of hoping the agent reconstructs it from scratch on every task. Below is the loop, step by step, and where each step breaks.

1. Map before you touch

The first agent shouldn't fix anything. Its job is to read the repository and produce a structural map: the entry points, the major modules and what each is responsible for, how a request flows from edge to database and back, and the build and test commands. For the Laravel project that meant routes, controllers, the service layer, the ORM models, and the migration history — the skeleton you'd otherwise absorb by osmosis over days.

The discipline that matters here: write the map down as durable context, not a one-off chat reply. A map that lives only in one agent's session dies when the session ends, and the next agent rebuilds it from zero — slower, and differently each time. Put it in a file the repository keeps (CLAUDE.md, an ARCHITECTURE.md, or equivalent) so every later agent starts from the same shared picture. This is the same lesson as keeping context and decisions consistent across parallel agents: context that lives in one head, human or model, doesn't survive contact with the next task.

2. Capture the conventions, not just the structure

A structural map tells the agent where code is. It doesn't tell the agent how this project writes code. Eight-year-old projects have a house style — a naming scheme, a chosen way to handle errors, a test layout, a preferred way to query the database — and it's rarely documented. Left alone, an agent imports its own defaults: it writes idiomatic-for-the-model code that's foreign to the repository, and your diff reads like a transplant.

So make an explicit pass to extract the patterns. Have an agent answer, with file citations: How are errors handled here — exceptions, result objects, error returns? What's the naming convention for classes and methods? How are tests structured, and what do they assert against? Where does new code of this kind usually go? Capture the answers next to the map. The goal is that a change blends in, because a fix that matches the house style is a fix the maintainers will actually merge.

The fastest way to get a change rejected from an unfamiliar project is to make it correct and out of place. Conventions are not cosmetic — they are how a codebase stays legible to the people who own it.

3. Turn the issue tracker into scoped tasks

An open issue is not a task an agent can take. It's a symptom report, often missing repro steps, sometimes stale, sometimes three problems wearing one title. Before any fixing starts, do a triage pass: read the issues, discard the dead ones, and rewrite the live ones as scoped units — one clear outcome, the files likely involved, and a way to know it's done. The eight-year-old project's hour of fixes worked because the issues were small and independent, not because the agent was clever. Scope is what makes work parallelizable and what keeps an agent from wandering across half the codebase chasing one bug.

4. Isolate each fix in its own worktree

Once you have a backlog of independent tasks, you can run several at once — but only if they can't step on each other. Give each agent its own git worktree so parallel fixes operate on separate checkouts of the same repository. Branch per task, isolated working directory, no shared mutable state. Two agents editing the same file in the same directory is a corruption waiting to happen; two agents in separate worktrees is just two branches you merge in sequence. This is also where you contain blast radius: a bad change is one branch you throw away, not a working tree you have to untangle.

5. Validate every fix — "fixed" is a hypothesis

An agent reporting a bug as fixed is making a claim, not stating a fact. Treat it as one. Every change runs against the existing test suite, and where a bug has reproduction steps, against those too. If the project's tests are thin — common in older codebases — the agent's first job on that task is to write a failing test that captures the bug, then make it pass. No green signal, no merge. This step is what separates ten real fixes from ten plausible-looking diffs, and it's the one most likely to get skipped under the excitement of watching things move fast.

Where this approach fails

Be clear-eyed about the limits, because the failure modes are specific and they don't announce themselves.

Shallow fixes that miss intent. An agent will happily silence a symptom — swallow the exception, special-case the failing input — without grasping why the code was written that way. The test goes green; the actual contract is now subtly wrong.
Confidently wrong changes. The output is fluent and self-assured at exactly the moments it's mistaken. There's no built-in signal that distinguishes a deep fix from a convincing one. You supply that signal.
Missing implicit domain knowledge. The map and the conventions capture what's in the code. They don't capture why a tax calculation rounds the way it does, or which "obvious" refactor will break a downstream consumer nobody documented. That knowledge lives in maintainers' heads, and the agent can't read it.

The conclusion isn't that the workflow doesn't work. It's that review matters more here, not less. When you don't know the codebase either, you can't rely on a gut reaction to catch a wrong change — which is exactly why the structure above front-loads so much. The map, the conventions, the tests, the isolation: each one is a place where a wrong change has to pass through something before it reaches the main branch. The workflow doesn't remove the need for judgment. It gives your judgment more to work with.

Making the loop repeatable

Run this once by hand and it works. Run it on every new repository and you'll notice the friction: the map and conventions live in files you have to remember to keep current, each agent still re-establishes context at the start of every session, and the worktree bookkeeping is yours to manage. The workflow is sound; the manual upkeep is the tax.

This is the loop defract is built to make repeatable — treat it as one approach, not the only one. It keeps a persistent memory of the codebase and its conventions so each agent starts current instead of rebuilding the map, runs work through a gated pipeline from story through review, and isolates implementation in worktrees by default. If you're weighing it against a plain parallel runner, defract vs Conductor draws that line. The point is to stop re-establishing context by hand every session, so the onboarding you did once keeps paying off on the next bug, and the one after that.

defract is in open beta

a shared memory and a gated lifecycle for your Claude Code agents. free, no caps, no signup.

download for mac talk to us in discord ›