Giving AI agents roles: PM, architect, reviewer, QA

If you run multi-agent setups, you have probably hit the same question everyone else does: how do you actually parameterize a PM agent versus a senior-dev agent. People want roles — a scoper, an architect, an implementer, a reviewer — but what they have is one generalist agent doing all of it. The instinct is right. The hard part is knowing what each role owns, what it gets as input, and what it hands off. That is the whole game, and it is mostly a question of inputs and boundaries, not clever prompts.

Start with why one generalist agent is the wrong default for anything non-trivial.

The problem with one agent doing everything

A single generalist agent collapses scoping, design, implementation, and review into one context window. That sounds efficient. It is not, for two specific reasons.

First, it anchors to its own first idea. The moment the agent reads a vague request, it forms an interpretation and starts building toward it. The scoping, the design, and the code all flow from that initial read, and nothing in the session ever challenges it. There is no point at which a fresh perspective gets to ask "is this even the right thing to build." The first idea becomes the only idea.

Second, it reviews its own work. When the same context that wrote the code is asked to check the code, it grades its own homework. It is fluent in the assumptions it just made, so the bugs that come from those assumptions are invisible to it. Self-review catches typos. It does not catch "this whole approach is wrong," because the reviewer and the author share the same blind spot.

This is the same reason human teams separate concerns. A product manager scopes, an engineer designs, someone else implements, and a different person reviews — not because one person could not do all four, but because the hand-offs force the work to survive a fresh set of eyes at each stage. Roles are a way to manufacture that separation on purpose.

The four roles and how to brief each

A role is defined by three things: what it owns, what it gets as input, and what it hands off. Keep each brief narrow. The narrower the brief, the less room the agent has to wander into the next role's job.

PM / scoper

Owns: turning a vague request into a scoped story with acceptance criteria. Input: the raw request plus product context — what the product does, who it is for, what is out of scope. Hands off: a story. The brief here is deliberately not technical. Tell this agent it is forbidden from proposing an implementation. Its only job is to decide what "done" means and to write acceptance criteria a different agent could verify against. If it starts naming files or functions, the brief is too loose.

Architect

Owns: turning the story into a technical plan. Input: the story plus a map of the codebase — the relevant modules, interfaces, and existing conventions. Hands off: a plan that names the files to touch, the interfaces to add or change, and the blast radius. This agent does not write production code. It decides where the work goes and what it touches, so that implementation becomes mechanical. A good plan is one an implementer can follow without making design decisions of its own.

Implementer

Owns: building against the fixed plan. Input: the plan, nothing more open-ended than that. Hands off: a diff. The key constraint is that the implementer builds against a plan it did not write and is not allowed to renegotiate mid-stream. If the plan turns out to be wrong, that is a signal to go back to the architect role, not to improvise. Implementers are also where parallelism lives: several can run at once, ideally in isolated git worktrees, each against its own slice of the plan. Sharing context cleanly across those parallel implementers is its own problem — covered in keeping context and decisions consistent across parallel AI agents.

Reviewer / QA

Owns: checking the diff. Input: the diff and the acceptance criteria — not the author's context. Hands off: a pass, or a list of problems. This is the role that most justifies the whole exercise, and it has one non-negotiable rule: it must be a separate context from whoever wrote the code. Give it an adversarial brief. Its job is to find what is wrong: type errors, lint failures, missing tests, architecture that drifted from the plan, acceptance criteria that are not actually met. A reviewer that shares the author's session is not a reviewer.

The two things that make roles work: a clean hand-off artifact between stages — a story, a plan, a diff — so each role starts from a fixed input rather than a running conversation, and a reviewer that is genuinely independent of the author. Get those two right and the rest is detail. Get them wrong and you have one generalist agent wearing four hats.

Be honest about the cost

Roles are not free. Each hand-off is a context you have to set up, an artifact you have to pass, and a stage that can stall waiting for the previous one. For a one-line change — rename a variable, fix an obvious typo, bump a dependency — the role overhead is pure waste. You would spend more time briefing a PM agent than the change is worth. Just ask one agent to do it.

The split pays off above a complexity threshold. When a task is ambiguous enough that scoping it wrong is expensive, or large enough that the design decisions matter, or risky enough that a missed bug costs real time, the overhead earns its keep. The independent review alone often catches the kind of mistake that would have taken an afternoon to find later. Below that threshold, roles are ceremony. Above it, they are the difference between a coherent change and a plausible-looking one. Knowing where your threshold is comes from running it both ways a few times.

From prompts to stages

The honest catch with all of this is that wiring roles by hand is fiddly. You are managing four briefs, passing artifacts between them, remembering to spin up a fresh context for review, and keeping each agent inside its lane. It works, but it is a lot of manual orchestration, and it is easy to let the reviewer quietly become the author again when you are in a hurry.

This is the structure defract encodes directly, so treat it as one way to do this rather than the only way. The roles above map onto a fixed lifecycle — story, design, architecture, implementation, review, release — where each role is a stage with a gate and a defined hand-off, instead of a prompt you re-assemble by hand every time. The story has to pass before architecture begins; the review is structurally a separate context from the implementation, so the author cannot grade its own work. It is the same separation of concerns argued for above, made the default rather than something you have to remember to set up. For how that compares to running a pool of parallel agents without the stages, see defract vs Claude Code Agent Teams.

However you build it, the test is the same: at each stage, is the work being judged by something other than the thing that produced it. If the answer is yes, the roles are doing their job. If it is no, you have one generalist agent with extra steps.

defract is in open beta

a gated lifecycle that turns story, design, architecture, implementation, and review into stages with real hand-offs. free, no caps, no signup.

download for mac talk to us in discord ›