What should specs for coding agents include?

Start with five lines: goal, allowed paths, forbidden paths, verification command, and merge owner. Add a decision stub for tradeoffs and a connector card for every MCP server the work touches. That gives a reviewer something to falsify, which is the entire point of a spec.

Why do agent diffs drift from the spec?

Usually because the spec described a solution instead of the problem, and because subagent summaries quietly omit child-owned paths. Child receipt blocks anchor each fork to paths touched, commands run, and tests proving the regression guards held. Drift then surfaces at the fork instead of at merge time, where it is far cheaper to catch.

How short can a working spec be?

Five lines, if they are the right five. A scope ledger is shorter than most templates and harder to argue with, because each line maps to something checkable in the diff. Long specs that describe intentions lose to short specs that bound behavior every time.

What is the fastest first artifact to try?

A five-line scope ledger for your next agent task. The reusable patterns in the OpenAI Skills repository show how far a small written contract stretches. Then take your last merged agent PR and check which spec line each changed file traces to; keep the format that leaves the fewest orphan files.

Specs for coding agents that hold up

A spec for a coding agent is the written contract a run executes against: goal, allowed paths, forbidden paths, verification command, merge owner. Write those five lines as a ledger and a tired reviewer can hold any diff against them in under a minute. That is the only test a spec has to pass. Not whether it read well at kickoff, but whether someone can falsify the diff against it later. This applies whether you run Cursor, Anysphere's AI code editor, Claude Code, or Codex.

The trap is writing specs that describe the solution you hope for. Reviewers get nothing to check, so they argue about formatting instead of approach, and a plausible diff with the wrong idea inside sails through.

Write a scope ledger instead of a paragraph

Most specs read like a proposal. They describe an intended solution in optimistic prose, which sounds complete and checks nothing. The fix is to bound behavior, not describe intentions.

A scope ledger is five lines that live at the top of the parent chat: goal, allowed paths, forbidden paths, verification command, merge owner. Each line maps to something a reviewer can point at in the diff. If a changed file does not trace to one of those lines, that file is a question.

Cursor rule files in .mdc sound precise until two people read them differently. The rule file is a mechanism; Cursor's agent docs cover it well, but the contract is still yours to write. Keep the scopes explicit and the forbidden paths honest.

Here is a delegation boundary you can drop in and adapt:

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

The same ledger works in Claude Code sessions and Codex runs. The file names change, the contract does not.

Give reviewers a decision to check, not a solution to admire

CI goes green and a reviewer still asks why this approach. That happens when the spec recorded a solution rather than a decision. The reviewer has the answer but none of the reasoning, so they cannot judge the alternatives you already rejected.

The fix is a decision stub: three forced lines in the PR. Constraints considered, rejected alternatives, verification proof. It is the tradeoff section of your spec, compressed to where reviewers actually look.

Subagent work drifts the same way, one fork at a time. A child agent summarizes its run and quietly omits the paths it owned, so the merge looks clean while the scope crept. Make every child return a receipt: paths touched, commands run, and the tests that prove the regression guards held. Attach that evidence at the point of work, not reconstructed at merge.

Connectors are the last blind spot. A spec that ignores MCP servers gets surprised by one. Wire up MCP servers and you will find a connector touching data nobody put on the diagram. Write a connector card for each: allowed actions, forbidden actions, owner, rollback. One card per server, treated as a dependency in the spec.

In our methodology, Test proves behavior and Review proves the team can explain it. That is what agentic coding governance means in practice. If cost is your next worry, the companion piece on coding plans that lower agent cost picks up there.

Check the spec the way a tired reviewer would

Write the spec as if your sharpest colleague is reading it at the end of a long day. This is the table they will actually run their eyes down:

Gate	Question
Risk routing	Were red folders touched, and who approved?
Replay proof	Which commands prove regression guards?
Receipt match	Does the PR body list scopes + verification transcript?
Rules precedence	Which `.mdc`, `SKILL.md`, or `CLAUDE.md` governed behavior?

And the short checklist that catches the rest:

MCP connectors mentioned (if any) list owners.
Verification command output is pasted or linked.
Forked agent work lists parent and child responsibilities.
Red-folder paths received explicit human acknowledgement.

Agents amplify whatever clarity the spec already has, including none. So put the security boundaries in writing too. The OWASP Top 10 for LLM applications and the NIST AI Risk Management Framework are the two lists worth stapling to any spec template. None of this replaces architecture judgement. Agents accelerate execution, not ownership.

Common questions

What should specs for coding agents include?

Start with five lines: goal, allowed paths, forbidden paths, verification command, and merge owner. Add a decision stub for tradeoffs and a connector card for every MCP server the work touches. That gives a reviewer something to falsify, which is the entire point of a spec.
Why do agent diffs drift from the spec?

Usually because the spec described a solution instead of the problem, and because subagent summaries quietly omit child-owned paths. Child receipt blocks anchor each fork to paths touched, commands run, and tests proving the regression guards held. Drift then surfaces at the fork instead of at merge time, where it is far cheaper to catch.
How short can a working spec be?

Five lines, if they are the right five. A scope ledger is shorter than most templates and harder to argue with, because each line maps to something checkable in the diff. Long specs that describe intentions lose to short specs that bound behavior every time.
What is the fastest first artifact to try?

A five-line scope ledger for your next agent task. The reusable patterns in the OpenAI Skills repository show how far a small written contract stretches. Then take your last merged agent PR and check which spec line each changed file traces to; keep the format that leaves the fewest orphan files.

Try it on a live ticket

Pick one open ticket and rewrite it as a five-line scope ledger before you hand it to an agent. Our training takes one of your live tickets and runs it end to end with your team.

Specs for coding agents that hold up in review

Write a scope ledger instead of a paragraph

Give reviewers a decision to check, not a solution to admire

Check the spec the way a tired reviewer would

Common questions

Try it on a live ticket

Related training topics

Related research

Cursor Composer layers in agentic coding

AI coding tools that last past the demo

AI coding wrappers that hold up under review

Ready to start?

Write a scope ledger instead of a paragraph

Give reviewers a decision to check, not a solution to admire

Check the spec the way a tired reviewer would

Common questions

Try it on a live ticket

Related training topics

Cursor subagents and team skills for engineering teams

Cursor rules training for engineering teams

Cursor MCP training for engineering teams

AI code review habits for generated code

Related research

Cursor Composer layers in agentic coding

AI coding tools that last past the demo

AI coding wrappers that hold up under review

Ready to start?