Back to Research

AI agent guardrails: why every harness needs them

Why agent harnesses need guardrails: AI agent guardrails that turn complete-sounding summaries into receipts reviewers can actually verify.

Panoramic View of the Alps, Les Dents du Midi, landscape painting by Gustave Courbet (1877).
Rogier MullerMay 4, 20265 min read

An AI agent guardrail is a repo-level rule that limits what a coding agent may do before a human reviews the result. You need them because the riskiest agent run is not the one that fails loudly, it is the one that reports success in prose nobody can verify. Whether you drive Cursor, Anysphere's AI code editor, or Claude Code, Anthropic's coding agent, the fix is the same: shrink the scope and write down the receipts.

We learned this the boring way. During a readiness drill, every sub-agent reported success. None of them reported which files they had touched, and the merge queue was waiting on us to vouch for the whole batch. The models were fine. The reviewable story was missing.

Write down which files each agent touched

When you chain agents, the parent summary tends to drop the paths the children owned. A report can sound complete and still skip entire directories.

Fix it with a child receipt block. Every child agent returns the paths it touched, the commands it ran, and the tests that prove the regression guards held. The parent stops green-lighting diffs it cannot see.

This turns delegation from a trust exercise into something a reviewer can check line by line. If a child cannot produce its own receipt, that is the signal to look closer.

Make the PR explain its own decisions

Continuous integration can be green while reviewers still ask "why this approach?" and find no written answer. People optimize for the checks passing, so the reasoning stays in someone's head.

Add a decision stub to your PR template. Three lines: constraints considered, rejected alternatives, verification proof. That is enough to move the conversation from vibes to an actual tradeoff anyone can read later.

Give each harness a written scope

Different tools blur scope in different ways, so write the boundary down where the agent will read it.

In Cursor, .mdc rule language sounds precise until reviewers argue about what it meant (Cursor agent docs). Carry a five-line scope ledger in the parent chat instead: goal, allowed paths, forbidden paths, verification command, merge owner. Review then checks the ledger against the diff rather than re-litigating the prompt.

In Claude Code, bash approvals turn into muscle memory on a shared laptop (Claude Code docs). Put a precedence clause at the top of CLAUDE.md stating which hooks win, which folders need human eyes, and where temporary overrides live. Sessions stop inventing policy mid-run because the order is already written.

Here is a small snapshot you can adapt per repo:

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

Codex CLI, OpenAI's coding agent, reads AGENTS.md the same way, so keep its verification notes replay-friendly for command-line runs.

Run a short audit before you merge

A guardrail you cannot audit is decoration. These four questions keep the audit short.

Gate Question
Reviewer path Can someone unfamiliar trace intent without chat replay?
Risk routing Were red folders touched, and who approved?
Replay proof Which commands prove regression guards?
Receipt match Does the PR body list scopes plus verification transcript?

Paste this checklist into the PR and tick it off before approval:

  • Verification command output is pasted or linked.
  • Forked agent work lists parent plus child responsibilities.
  • Red-folder paths received explicit human acknowledgement.
  • Scopes in the PR body match folders in the diff.

If your repo cannot state its boundaries plainly, agents will guess, and guessing scales poorly. The MCP specification defines the connector surface, and NIST's AI Risk Management Framework gives leadership the words for the rest.

Common questions

  • What are AI agent guardrails in practice?

    AI agent guardrails are written, repo-level limits on agent behavior: a scope ledger naming allowed and forbidden paths, a precedence clause stating which hooks win, receipt formats for delegated work, and a decision stub in the PR template. They live in files, not in chat, so they survive the session and the next reviewer can read them.

  • Why do sub-agent summaries need receipts?

    Because summaries compress away the one thing reviewers need: which paths each child owned. A complete-sounding report can omit entire directories. A child receipt block listing paths touched, commands run, and regression tests turns delegation from a trust exercise into something a parent can check line by line.

  • Which guardrail should a team add first?

    The decision stub, because it costs three lines in a PR template and immediately exposes the runs nobody can explain. Constraints considered, rejected alternatives, verification proof. Once those answers feel easy to produce, add the scope ledger and the per-harness rules. Start small so the habit sticks.

  • Do guardrails slow down agent-assisted delivery?

    They trade seconds at run time for hours at review time. The expensive path is the current one: reviewers reconstructing intent from chat archaeology, or waving through diffs they cannot defend. A five-line ledger is cheaper than one foreseeable incident retro, and far cheaper than a rollback.

Start with one receipt

Add a child receipt block to your delegation prompt today, then grow it into a scope ledger and a PR decision stub. If you want the full rollout order with your team, our AI coding governance training walks through it hands-on.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch