what is spec-driven development? (with ai coding agents)

Most AI coding workflows start the same way - you open the agent, describe what you want in a sentence or two, and watch it write code. It feels fast. Then the diff comes back and it built the wrong thing, or the right thing the wrong way, and you spend the next hour correcting it through follow-up prompts. The agent was never confused about how to write the code. It was confused about what you actually wanted.

spec-driven development is a direct response to that failure mode. Instead of prompting an agent straight to code, you first produce a spec - a written description of the requirements, the design, and the tasks - and you correct that spec until it is right. Only then does the spec drive implementation. The decision about what to build happens once, explicitly, on paper, before any code exists.

what spec-driven development actually means

The term has become popular through tools like AWS Kiro, which put "specs" directly in the IDE, and GitHub's spec-kit, which brings the same idea to Copilot, Claude Code, and other agents. But the idea is older than any of these tools - writing down what a system should do before building it is just engineering, and the AI tooling is a new wrapper around an old discipline.

The shared shape across the tools is a three-part artifact, usually written as plain markdown:

requirements - what the feature does, who uses it, what success looks like, and the acceptance criteria. This is behavior, not implementation. No stack, no file names.
design - the technical plan that satisfies those requirements. Architecture, data models, the interfaces involved, the constraints and standards the code has to respect.
tasks - the design broken into small, reviewable, testable units. Each one is concrete enough to hand to an agent and check on its own, like "add a registration endpoint that validates email format and rejects duplicates."

Each part feeds the next. Requirements constrain the design, the design decomposes into tasks, and the tasks are what the agent implements. The human reviews and corrects at each step, so by the time code generation starts, the agent is working from a document you have already agreed with rather than guessing at thousands of unstated details from a one-line prompt.

why this matters now

For most of software's history, the expensive, slow part was writing the code. Specs felt like overhead because the typing was the bottleneck and you could just refactor your way out of a misunderstanding. That economics has changed. When an agent can produce a feature's worth of code in minutes, writing the code is no longer the constraint. The bottleneck moved to deciding precisely what to build.

A spec is how you make that decision once, in a form both you and the agent can read. Vague intent forces the model to fill in gaps, and it fills them with plausible guesses that you only discover are wrong after the code is written. A spec moves the guessing forward, into a cheap document, where a wrong assumption costs a sentence to fix instead of a sprint to unwind.

The point of a spec is not documentation. It is to surface the disagreement between what you meant and what the agent understood while that disagreement is still one paragraph instead of two thousand lines.

where it helps

you review intent, not diffs

Reviewing a large diff is reverse-engineering - you read the code and try to reconstruct what the author intended, then judge whether that intent was correct. A spec inverts the order. You agree on the intent first, in language, then the implementation is checked against an intent you already approved. Reading "validate email format and reject duplicates" and deciding it is right takes seconds. Reading the endpoint, the validation, and the database query to infer the same thing takes much longer and you can still miss the gap.

you catch the wrong build before you pay for it

The most expensive mistake in AI-assisted work is building the wrong thing well. A spec is the cheapest place to catch it. If the requirements say the feature should do X and you wanted Y, you find out before a single task runs, not after the agent has produced a polished, tested, completely misaimed implementation.

you can parallelize honestly

Once a design is decomposed into discrete tasks with clear boundaries, you can fan multiple agents out across them without them colliding or duplicating work. The spec is the shared contract that keeps independent work coherent. Without it, parallel agents each invent their own interpretation of the same vague goal and you spend the saved time reconciling them.

the honest limits

spec-driven development is not free and it is not always worth it. The discipline carries real costs, and being fair about them is the only way to use it well.

It is overkill for small work. A one-line fix, a copy change, a rename - writing a requirements-design-tasks document for these is slower than just doing them, and pretending otherwise is how a good practice turns into busywork. The size of the spec should match the size and risk of the change, and for trivial work that means no spec at all.

Specs go stale. A spec is only the source of truth if it is maintained. The moment the code drifts from the document and nobody updates the document, the spec becomes a confident, out-of-date lie - worse than no spec, because people trust it. This is the failure that sank earlier model-driven approaches, and it has not gone away.

And agents do not always obey. A larger context window and a detailed spec do not guarantee the model follows every line. Specs reduce ambiguity, they do not eliminate the need to review the output. Some practitioners also find verbose markdown specs tedious to review in their own right, which is a real tradeoff rather than a solved problem. Treat the spec as a tool for making intent explicit, not as a contract the model is forced to honor.

how defract approaches this

This is the problem defract was built around, so it is worth being upfront about the bias. defract is a desktop app for running parallel Claude Code agents through a gated lifecycle - story, design, architecture, implementation, review, release - where each stage has to pass an approval gate before the next one starts.

Read that lifecycle against the spec workflow and they line up. The scope, design, and architecture stages are the spec - requirements, then design, then the technical plan - produced by the agents and gated for your correction before any implementation runs. The difference from a markdown file in your repo is that the spec is structural rather than optional. You cannot prompt straight to code; the gates are where you review intent instead of diffs, and trivial tasks skip the heavier stages so the spec stays proportional to the work. A memory layer carries the decisions you made forward into later tasks, so the corrections you make on one spec do not have to be remade on the next. spec-driven development built into the product rather than bolted on through a convention you have to remember to follow.

None of this makes specs the answer to every problem. The stale-spec risk and the review burden are real here too, and for a quick fix the gates are something to skip rather than celebrate. Treat spec-driven development as one approach to the deciding-what-to-build problem, not the answer - and treat defract as one opinionated take on that approach.

defract is in open beta

a structured lifecycle for your parallel Claude Code agents. free, no caps, no signup.

download for mac talk to us in discord ›