Back to Research

Agentic coding governance for engineering teams

Agentic coding governance for engineering teams: the written contracts, decision stubs, scope ledgers, and replay receipts, that keep agent diffs explainable.

Château de Pierrefonds restauré par Viollet-le-Duc, landscape painting by Paul Huet.
Rogier MullerMay 7, 20266 min read

Agentic coding governance is the set of written contracts that keep an agent's changes explainable after the chat is closed. It governs the delegation, not the model: who allowed what, who verified it, and who owns the merge. Most teams discover they need it in a retro, when a diff bundles a refactor with an intent shift nobody flagged and the one person who could explain it has rotated off. Green CI told you the code ran. It did not tell you why the agent did what it did.

The good news is the fix is small and boring. Four short contracts, each one answering a question a reviewer already asks out loud. You can adopt them one at a time, this week, on your real merge queue.

Why agent diffs stop being explainable

Tool access grows faster than ownership. You connect another MCP server, then another, and soon your agent can reach everything while no human can say who owns the reach. The changes still merge. They just arrive in a shape nobody can narrate.

The pattern is old. Systems tend to mirror how the team communicates, so a team whose agent runs live in unsaved chats ships changes with the same trait: wired to everything, owned by nobody. A sharper prompt template will not fix that, because the missing thing is a record, not a phrasing.

Governance is what turns one good session into a repeatable habit. Without it, every incident is "blameless" and also completely foreseeable, which is a frustrating place to keep landing.

The four contracts, by tool

Each contract maps to a tool you probably already run, and to the question reviewers keep asking.

The decision stub answers "why this approach?" Your PR template forces three lines: constraints considered, alternatives you rejected, and the proof you verified it. Review stops being a vibe check and starts being a read of explicit tradeoffs.

The scope ledger answers "what was it allowed to touch?" In Cursor, .mdc rules read as precise until reviewers argue over what they meant, and the rules quietly compete with chat memory (Cursor agent docs). So the parent chat carries five lines: goal, allowed paths, forbidden paths, verification command, merge owner. Now review checks the ledger against the diff instead of relitigating the prompt.

The CLAUDE.md supremacy clause answers "which rule wins?" Approving bash commands in Claude Code becomes muscle memory, and hooks help, but precedence still needs to live in a file. The top of CLAUDE.md states which hooks win, which folders need human eyes, and where temporary overrides go. Sessions stop inventing policy mid-run.

The replay sandwich answers "show me how you got here." Codex CLI runs merge green while reviewers never see the transcript. So AGENTS.md requires three things before the PR: an intent line, the command transcript, and a diff summary. Review becomes reproducible without anyone standing behind a terminal.

Here is a single delegation-boundary file you can drop in and adapt:

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

Test it like a reviewer

Governance is only real if it survives a stranger. Hand the PR to someone unfamiliar with the work and ask four questions. If they can answer all four from the PR itself, the contracts are doing their job.

Gate Question
Connector truth Which MCP servers fired, and were they expected?
Reviewer path Can someone unfamiliar trace intent without chat replay?
Risk routing Were red folders touched, and who approved?
Replay proof Which commands prove regression guards?

Then run this checklist on the PR body before you approve:

  • Scopes in the PR body match the folders in the diff.
  • Primary-doc links were smoke-checked after publishing edits.
  • MCP connectors mentioned (if any) list owners.
  • Verification command output is pasted or linked.

If a rule never answers a recurring question, delete it. That is how you keep this from turning into process for its own sake.

What this does not buy you

None of this replaces architecture judgement. Agents speed up execution; they do not own decisions, and a clean ledger around a bad design is still a bad design. The contracts make delegation legible, nothing more.

For the connector side, start from the MCP specification. When you need board-level language for the same ideas, NIST's AI Risk Management Framework and OWASP's LLM Top 10 both speak the right dialect. The instinct behind all of it, files over vibes, is the same one in OpenAI's skills repository.

Common questions

  • What is agentic coding governance?

    Agentic coding governance is the set of written contracts, scope ledgers, precedence clauses, replay receipts, and decision stubs, that keep agent-produced changes explainable after the session ends. It governs the delegation, not the model. The questions it answers are who allowed what, who verified it, and who owns the merge into your main branch.

  • How do you stop governance from becoming process theater?

    Anchor every rule to a reviewer question it answers. The decision stub exists because reviewers ask "why this approach?" The scope ledger exists because they ask "what was it allowed to touch?" A rule that answers no recurring question gets cut. If you cannot name the question, you do not need the rule yet.

  • Who owns agent governance on a team?

    The merge owner named in the scope ledger owns it for that change. The contract files themselves work like any shared tooling: whoever edits CLAUDE.md, AGENTS.md, or a connector card does it through a reviewed PR, not a quiet commit. Ownership is per-change at the diff, and per-file for the rules.

  • What is the first sign a team needs this?

    Diffs that bundle a refactor with an invisible intent shift. When a reviewer cannot tell which lines serve the stated goal and which just came along, the team is already paying for missing governance. The retro that names it simply has not happened yet. Start before it does.

Start with one contract

Pick the decision stub first, since it is three lines in a template and ships before your next agent run. Our training turns these four contracts into a working session against your own merge queue, and the rest of the AI coding governance cluster maps the failure modes that sit next door.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch