The AI code review workflow that survives green CI
An AI code review workflow for agentic teams: connector ownership, scoped fixes, decision stubs, and replay evidence that hold up when CI is green.

Green CI tells you the code compiled and the tests passed. It does not tell you why an agent chose this approach, which folders it was allowed to touch, or whether anyone watched the verification run. An AI code review workflow is a small set of files in your repo that records those answers, so a reviewer can approve agent-written changes without replaying the session that produced them. We keep watching the same thing in tech-lead pairing: the checks are green, the merge stalls, and nobody can write down why the diff looks the way it does.
This matters more once you run agents in parallel. Tools like Claude Code, Anthropic's coding agent, Cursor from Anysphere, and Codex CLI from OpenAI all make it cheap to produce a passing diff and thin on the context behind it. The fix is not more autonomy. It is leaving receipts.
Ask "why this approach?" and answer it in the PR
The most common gap is the simplest one. CI is green, a reviewer asks why this approach over the obvious alternative, and the only answer lives in a chat log they cannot see. So the question gets skipped and the merge goes through on trust.
A decision stub closes that gap. It is three required lines in your PR template: constraints considered, rejected alternatives, and verification proof. Three lines is enough to move a review from a feeling to a written tradeoff someone can accept or push back on.
## Decision stub
- Constraints considered: <perf budget, auth boundary, schema lock>
- Rejected alternatives: <what you tried or ruled out, and why>
- Verification proof: <command run + pasted/linked output>
The stub is annoying to fill in when the change is sloppy, which is the point. If the author cannot name a rejected alternative, the review just found a judgment gap before merge instead of after.
Pin the scope so reviewers check the diff, not the prompt
Agent work drifts. A Cursor .mdc rule reads as precise until two reviewers argue about what it actually permitted. The diff touches a file nobody expected, and the conversation turns into prompt archaeology.
Carry a five-line scope ledger in the parent chat: goal, allowed paths, forbidden paths, verification command, merge owner. Now review is mechanical. You hold the ledger next to the diff and check whether the agent stayed inside the lines it was given.
| Gate | Question with a yes/no answer |
|---|---|
| Connector truth | Which MCP servers fired, and were they expected? |
| Reviewer path | Can someone unfamiliar trace intent without chat replay? |
| Risk routing | Were red folders touched, and who approved? |
| Replay proof | Which commands prove the regression guards held? |
A gate is only real if the PR answers its question without anyone opening a terminal. If an answer lives only in chat, the merge is not finished.
Write down which rules win before a session invents its own
On a shared laptop, bash approvals in Claude Code turn into muscle memory. Hooks help, but an agent still needs to know which rule outranks which when they conflict mid-run.
Put a short precedence block at the top of CLAUDE.md: which hooks win, which folders need human eyes, and where temporary overrides live. Sessions stop inventing policy on the fly because the precedence is already written down. For CLI agents, do the same favor for review: AGENTS.md can require an intent line, then the command transcript, then a diff summary before the PR opens, so a reviewer reproduces the run instead of standing behind your terminal.
Here is a delegation snapshot you can drop in and adapt:
---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
- "**/*"
alwaysApply: false
---
- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.
Run the reviewer handoff as a checklist
The whole workflow comes down to a short list a reviewer can run in under a minute. Paste this into your PR template under the decision stub:
- MCP connectors used (if any) list their owners.
- Verification command output is pasted or linked.
- Forked agent work names parent and child responsibilities.
- Red-folder paths got explicit human acknowledgement.
Some decisions never go on autopilot, no matter how clean the receipts look. Threat models, customer promises, and blast-radius calls stay with a human. The workflow makes everything else fast precisely so people spend their attention where it counts.
If you want the wider rulebook, the agentic coding governance topic page collects the full set, and the practice of routing parallel output through one review gate is in our methodology.
Common questions
-
What does an AI code review workflow add when CI is already green?
It adds the judgment layer checks cannot measure. A decision stub states the constraints and the rejected alternatives, a scope ledger matches the diff to what was allowed, and a replay transcript proves the verification actually ran. Green CI says the code passed. The workflow says a person can explain it.
-
What is a decision stub in a PR template?
A decision stub is three required lines: constraints considered, rejected alternatives, and verification proof. It exists because people optimize for the checks passing, not for explaining themselves. The stub moves review debate from vibes to explicit tradeoffs a reviewer can accept or challenge in writing, before the merge lands.
-
How does the replay sandwich work for CLI agents?
AGENTS.mdmandates a fixed order: an intent line, then the command transcript, then a diff summary, all written before the PR opens. That makes the review reproducible without anyone standing behind a terminal. It is the fix for merged green builds that nobody actually watched run. -
Which review questions need file-backed answers before merge?
Four of them. Which rule file governed behavior, which MCP servers fired and whether they were expected, whether someone unfamiliar can trace intent without chat replay, and whether red folders were touched and who approved. If any of those answers lives only in a chat log, the merge is not done yet.
-
Does this slow teams down?
A little at write time, far less at review time. Filling a three-line stub costs the author a minute. Skipping it costs a reviewer twenty minutes of prompt archaeology, or costs the team an outage they cannot explain later. The receipts pay for themselves the first time a green build hides a bad call.
Next move
Add the decision stub to your PR template today, then bring your team through the rest of the contract in training before you add the next agent to the pipeline.
Docs to keep open
Related training topics
Related research

Codex workspace agents need repo rules
Codex workspace agents and Cursor cloud agents need repo rules: scoped boundary files, connector cards, and replay receipts reviewers can check.

Agentic coding governance for engineering teams
Agentic coding governance for engineering teams: the written contracts, decision stubs, scope ledgers, and replay receipts, that keep agent diffs explainable.

Govern Coding Agents as a Team
A team convention for running coding agents with scoped rules, MCP boundaries, and reviewable Cursor workflows.
Continue through the research archive
Newer research
Agentic coding governance for engineering teams
Agentic coding governance for engineering teams: the written contracts, decision stubs, scope ledgers, and replay receipts, that keep agent diffs explainable.
Earlier research
AI agent guardrails: why every harness needs them
Why agent harnesses need guardrails: AI agent guardrails that turn complete-sounding summaries into receipts reviewers can actually verify.