An AI coding workflow that holds up under audit
An AI coding workflow built on receipts: child receipt blocks, decision stubs, scope ledgers, and precedence files that survive audit.

Crunch week is when the summaries shrink to bullet vibes. The agents kept shipping, the diffs kept landing, and the team discovered that the bottleneck had quietly moved from typing speed to traceability, right when nobody had time to rebuild it. An AI coding workflow that holds up is built for that week, not the calm ones. An AI coding workflow is the agreed loop, brief, edit, verify, receipt, review, that agent-assisted changes follow from prompt to merge. The receipts are what make it auditable later.
The week the summaries shrank
Counter-thesis: trust does not scale when receipts stay in chat, and a faster agent only moves the queue to the part of the system that cannot read chat.
The wrong path: We believed smaller tasks guaranteed safer autonomy. We watched that assumption fail during crunch weeks, when summaries shrank to bullet vibes and the rules files quietly contradicted the skill the agent had just activated.
Diagnosis: Chesterton's fence, unlabeled. Agent diffs remove and rebuild fences constantly, and a reviewer who cannot see why a fence moved has two bad options: block everything or trust everything.
Thesis: traceability is the real throughput lever.
The receipts the audit will ask for
Ritchie-style pragmatism applies: make traceability easy before you make generation easy.
Recursive handoff blur. Chained agents return summaries that omit child-owned paths, the telephone game with commit access.
Named fix: Child receipt block. Every child returns the paths it touched, the commands it ran, and the tests proving regression guards. Parents stop green-lighting mystery diffs.
Review queue theater. CI is green and reviewers still ask why this approach, with no written answer anywhere.
Named fix: Decision stub. The PR template forces three lines: constraints considered, rejected alternatives, verification proof. The fence gets a label before anyone moves it.
Cursor scope fog. Teams shipping Cursor agent work weekly watch .mdc language sound precise until reviewers argue about what it meant. Rules compete with chat memory.
Named fix: Scope ledger. The parent chat carries a five-line ledger: goal, allowed paths, forbidden paths, verification command, merge owner. Review checks ledgers against diffs instead of debating prompts.
Claude permission creep. On shared laptops, Claude Code bash approvals become muscle memory, and permission literacy needs file-backed precedence.
Named fix: CLAUDE.md supremacy clause. The top of CLAUDE.md states which hooks win, which folders require human eyes, and where temporary overrides live. Sessions stop inventing policy mid-run.
---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
- "**/*"
alwaysApply: false
---
- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.
This routes through our methodology at the Review gate: parallel agent output must be inspectable without replaying sessions. The companion patterns live on the agentic coding governance page, and specs and tests as the stable stack covers the contract these receipts are checked against.
Audit questions worth automating
An auditable workflow answers these from the PR body alone.
| Gate | Question |
|---|---|
| Replay proof | Which commands prove regression guards? |
| Receipt match | Does the PR body list scopes + verification transcript? |
| Rules precedence | Which .mdc, SKILL.md, or CLAUDE.md governed behavior? |
| Connector truth | Which MCP servers fired, and were they expected? |
Synthesis: agents are relief crews; the blueprint still belongs to the humans standing outside the trench.
If your repo cannot state boundaries plainly, agents will guess, and guessing is the one behavior that gets worse with scale.
Best ways to use this research
- Best for: Cursor teams deciding which rule, subagent, skill, or MCP boundary to standardize next in their AI coding workflow.
- Best first artifact: turn the child receipt block into a
.mdcrule, AGENTS.md note, subagent receipt, or review checklist before the next automated run. - Best comparison angle: compare the receipt-first loop against the current Cursor review path, connector scope, and team rule file; keep the path that leaves the shortest auditable trail.
Common questions
What does a good AI coding workflow look like? A loop where every step leaves an artifact: a scope ledger before the run, receipts and transcripts during it, a decision stub in the PR, and precedence files in the repo. The test is whether a reviewer can defend the merge without replaying the chat.
How do we audit agent work after the fact? From the receipts. Child receipt blocks list paths, commands, and regression tests; decision stubs preserve constraints and rejected alternatives; the scope ledger shows what was allowed. If those artifacts are missing, the audit becomes archaeology, and archaeology during crunch week does not happen.
How do we make AI-written code easier to review? Shrink what the reviewer must reconstruct. Ship the verification command and its output with the diff, keep scopes in the PR body, and label every removed fence with the reason. Reviewers move fast when the narrative arrives with the change.
Further reading
- OpenAI Developers: Codex quickstart
- OWASP Top 10 for Large Language Model Applications
- NIST AI Risk Management Framework
- Google Search Central: helpful, people-first content
- Google Search Central: generative AI content guidance
- Model Context Protocol specification
- OpenAI Skills repository
Next step
The white paper turns this loop into a checklist your next audit can run against: read the white paper.
Related training topics
Related research

MCP training for engineering teams
Practical mcp training for engineering teams using agentic coding, review guardrails, and connector boundaries.

Codex workspace agents need repo rules
Codex workspace agents and Cursor cloud agents need repo rules: scoped boundary files, connector cards, and replay receipts reviewers can check.

Fast mode is not the default: when fast models earn it
The fast model is a tradeoff you make on purpose: scope ledgers, replay sandwiches, and connector cards that keep fast agent runs reviewable.