Specs and tests: the stable stack for AI

The incident review found the contradiction before we did: the rules file said one thing, the skill the agent had just activated said another, and the agent had shipped on the skill while the sprint clock ran. Parallelism punishes fuzzy scopes first. That is the case for specs and tests as the stable layer in AI coding. Specs and tests are the written contract, expected behavior plus executable proof, that every agent-written change gets checked against. Stacks and models rotate; the contract is what holds.

Where the rules file and the skill disagreed

Counter-thesis: review cannot compress what the repo never recorded, and no amount of model quality writes the record for you.

The wrong path: We believed reviewers would absorb implicit intent. We tested that while hooks existed but nobody read the transcripts, and forks without receipts ate the sprint budget before lint ever failed.

Diagnosis: Brooks's law applies to agents the way it applied to people. Adding workers to underspecified work makes it later, and agents are workers you can add in seconds, which makes the spec the bottleneck on day one.

Thesis: stable specs and tests are what make agent parallelism safe.

Four fixes that stabilize the stack

The stable stack is not a framework choice; it is the set of artifacts that stay true while agents work in parallel.

Cursor scope fog. .mdc language sounds precise until reviewers argue about what it meant. Rules compete with chat memory, split-brain coordination without a referee.

Named fix: Scope ledger. The parent chat carries a five-line ledger: goal, allowed paths, forbidden paths, verification command, merge owner. Review shifts from debating prompts to checking ledgers against diffs.

Recursive handoff blur. Parallel children return summaries that omit child-owned paths, the telephone game at machine speed.

Named fix: Child receipt block. Every child returns the paths it touched, the commands it ran, and the tests proving regression guards. Parents merge evidence instead of confidence.

Review queue theater. CI is green and reviewers still ask why this approach, with no written answer. Humans optimize for checks passing.

Named fix: Decision stub. The PR template forces three lines: constraints considered, rejected alternatives, verification proof. Debate moves from vibes to explicit tradeoffs.

MCP blast radius. A connector wired for a demo ends up touching data nobody listed on the diagram. Least privilege needs explicit trust boundaries.

Named fix: Connector card. One markdown card per MCP server: allowed actions, forbidden actions, owner, rollback. Operators know what off looks like before they need it.

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

In our methodology the spec belongs to Plan: agents inherit whatever the Plan step wrote down, and nothing else. The cluster's remaining patterns live on the agentic coding governance page, and AI coding tools that keep working after rollout shows what happens to the same contract at tool-selection time.

Checking the contract at merge

A spec did its job when the reviewer never has to ask why did the agent touch this file.

Gate	Question
Risk routing	Were red folders touched, and who approved?
Replay proof	Which commands prove regression guards?
Receipt match	Does the PR body list scopes + verification transcript?
Rules precedence	Which `.mdc`, `SKILL.md`, or `CLAUDE.md` governed behavior?

Synthesis: picture two clocks, one for shipping and one for explainability. When only the shipping clock ticks, the team pays later with interest.

Hard constraints still belong to humans: threat models, customer promises, and blast radius decisions stay off autopilot.

Best ways to use this research

Best for: Cursor teams deciding which rule, subagent, skill, or MCP boundary to standardize next around specs and tests for agent work.
Best first artifact: turn the scope ledger into a .mdc rule, AGENTS.md note, subagent receipt, or review checklist before the next parallel run.
Best comparison angle: compare the spec-first workflow against the current Cursor review path, connector scope, and team rule file; keep the path that leaves the shortest auditable trail.

Common questions

Do coding agents need specs and tests to be reliable? Yes, because they inherit ambiguity faster than people do. Specs pin expected behavior, tests prove it executable, and the pair gives parallel agents a contract that does not depend on chat memory. Without it, every fork resolves ambiguity its own way.

Why does parallel agent work fail without clear scopes? Because parallelism punishes fuzzy scopes first. Two agents with overlapping, unwritten boundaries will both touch the contested path, and the conflict surfaces at merge time as a mystery. A five-line scope ledger per fork keeps the boundaries checkable against the diffs.

What is a scope ledger? Five lines carried in the parent chat: goal, allowed paths, forbidden paths, verification command, merge owner. It is the smallest spec that still lets a reviewer test an agent diff against a written boundary instead of against someone's memory of the prompt.

Next step

If your specs live in chat and your tests live in hope, we can help you move both into the repo: contact.

Specs and tests: the stable stack for AI coding

Where the rules file and the skill disagreed

Four fixes that stabilize the stack

Checking the contract at merge

Best ways to use this research

Common questions

Further reading

Next step

Related training topics

Related research

AI agent boundaries that hold under pressure

Agent boundaries for teams running coding agents

How to set up an AI coding workshop for your engineering team

Ready to start?

Where the rules file and the skill disagreed

Four fixes that stabilize the stack

Checking the contract at merge

Best ways to use this research

Common questions

Further reading

Next step

Related training topics

Cursor subagents, skills, rules, and MCP for teams

Cursor team conventions for engineering orgs

Cursor CLI workflows for production codebases

MCP training for engineering teams: servers, skills, workflows

Related research

AI agent boundaries that hold under pressure

Agent boundaries for teams running coding agents

How to set up an AI coding workshop for your engineering team

Ready to start?