Back to Research

Specs That Hold Up in Review

Short specs help coding agents stay on track through review and implementation.

Editorial illustration for Specs That Hold Up in Review. A lot of agentic coding fails before code is written. The weak point is often the spec.
Rogier MullerApril 19, 20265 min read

A lot of agentic coding fails before code is written. The weak point is often the spec. If the task is vague, the agent fills in gaps with defaults. If the spec is too long, the important constraints get buried. If it is too rigid, it blocks useful iteration.

The useful middle ground is a spec that can be reviewed, edited, and executed without turning into a second project. The pattern is broader than any one tool: make the spec easier to challenge before implementation starts.

What a durable spec does

A durable spec is not a design doc. It is not a brainstorm dump. It is a short working agreement between the person and the agent.

It should answer four questions clearly:

  • What outcome are we trying to achieve?
  • What is out of scope?
  • What constraints must the implementation respect?
  • How will we know the result is acceptable?

Most agent failures come from missing one of those pieces. The agent then optimizes for the wrong thing, or spends time on edge cases nobody asked for.

The best specs also invite pushback. If a requirement is ambiguous, the spec should surface that ambiguity instead of hiding it. If two constraints conflict, the spec should make that conflict visible.

A practical spec shape

For agentic coding, a spec usually works best when it stays compact and structured. A useful pattern is:

  • Goal
  • Non-goals
  • Constraints
  • Inputs and outputs
  • Acceptance checks
  • Open questions

This is not about formatting for its own sake. It is about making review faster. A reviewer should be able to scan the spec and spot missing assumptions in under a minute.

Write in short sentences. Avoid long paragraphs that mix intent, implementation detail, and edge cases. If the agent needs to infer something important, call it out directly.

For example, instead of saying “improve the import flow,” say “reduce failed imports for CSV files under 5 MB; do not change the file picker; preserve existing validation messages.” That gives the agent a target and gives the reviewer something concrete to check.

Where this helps most

This pattern matters most when the task has any of the following traits:

  • multiple valid implementation paths
  • hidden product constraints
  • user-facing behavior that can regress quietly
  • a need for human approval before merge

Those are the cases where a coding agent can produce something that looks reasonable but misses the real requirement.

A better spec reduces that risk in two ways. First, it narrows the search space. Second, it makes the review conversation more specific. Instead of debating the whole feature, you can debate one assumption at a time.

Implementation steps that hold up

A workable process looks like this:

  1. Draft the spec in plain language before asking for code.
  2. Ask the agent to identify ambiguities, missing constraints, and likely failure modes.
  3. Revise the spec once, then freeze it for the first implementation pass.
  4. Use the spec as the review checklist, not just as context.
  5. If the implementation drifts, update the spec before asking for another pass.

That last step matters. Teams often patch the code while leaving the spec stale. Over time, the spec stops being a source of truth and becomes decorative. Once that happens, the agent loses a key guardrail.

Tradeoffs and limitations

This approach is not free.

A stronger spec takes time up front. For small tasks, that overhead may not pay back. If the change is trivial, a lightweight prompt is often enough.

There is also a risk of over-specifying. If every edge case is written down before the first pass, the task can become slow and brittle. Agents are useful partly because they can explore implementation space. A spec that tries to eliminate all uncertainty can block that.

Another limitation: a good spec does not guarantee a good result. It only improves the odds. The implementation still needs tests, review, and judgment. If the codebase is hard to navigate or the test surface is weak, the spec can only do so much.

What to watch for in practice

When teams adopt this pattern, the main signal is not whether the spec looks polished. It is whether review gets easier.

A good sign is that reviewers spend less time asking “what were we trying to do?” and more time asking “does this satisfy the constraint?” Another good sign is that the agent produces fewer confident but irrelevant changes.

A bad sign is that the spec becomes a ritual document nobody reads. If that happens, shorten it. The point is not completeness. The point is decision quality.

A small methodology note

This fits the Review step well: use the spec as something to challenge before code lands, not just something to generate from.

Bottom line

Agentic coding works better when the spec is treated as an executable agreement. Keep it short. Make constraints explicit. Surface uncertainty early. Use review to tighten the spec, not just the code.

That is a modest change, but it tends to hold up better than relying on the model to infer intent from a loose prompt.

Related research

Ready to start?

Transform how your team builds software today.

Get in touch