Back to Research

Codex workspace agents need repo rules

Codex workspace agents and Cursor cloud agents need repo rules: scoped boundary files, connector cards, and replay receipts reviewers can check.

Scenes from the Tale of Genji, landscape painting by Tosa School (1700).
Rogier MullerMay 14, 20266 min read

Codex workspace agents need repo rules before you let them open PRs on their own: a written boundary that says what is allowed, what is forbidden, and how the work gets verified. A workspace rule is a repo-level file the agent reads before it acts, like a Cursor .mdc scope, a CLAUDE.md precedence note, or an AGENTS.md verification block. Codex CLI, OpenAI's coding agent, will happily merge a clean diff overnight, but a clean diff is not a reviewable one. The thing you are short on is not model quality. It is whether anyone can check what came back.

Why overnight agent PRs are hard to review

A cloud agent will open a PR with a tidy diff, a confident description, and a decision trail that lives nowhere the reviewer can reach. The agent optimizes for finishing its task. You hold the risk. Without a written contract in the repo, the agent's convenience wins by default.

Smaller tasks do not save you here. We assumed they would, ran cohorts before scopes lived in plain files, and watched the small tasks merge with the same fuzzy trail as the big ones. The bottleneck had moved from typing speed to traceability, and nothing about task size touched that.

So the rule is simple. If your repo cannot state its boundaries plainly, the agent will guess, and guessing scales badly.

Write four rules the agent reads and the reviewer can check

A workspace rule only counts if the agent reads it before acting and the reviewer can check it after. Cursor publishes its own cloud agent best practices; use those alongside repo rules, not instead of them. Here are four common failure modes and the file you write to fix each one.

Codex replay gaps. Lean on Codex CLI and you will merge greens where reviewers never saw the transcript. The fix is a replay sandwich: AGENTS.md mandates an intent line, then the command transcript, then a diff summary before the PR. Review becomes reproducible without anyone standing behind a terminal.

MCP connectors reaching too far. Connectors built on the Model Context Protocol default to capability demos, and one of them eventually touches data nobody put on the diagram. The fix is a connector card: one markdown card per server listing allowed actions, forbidden actions, owner, and rollback. Incidents shrink once operators know what "off" looks like.

Blurry recursive handoffs. Chained agents return summaries that quietly omit the paths a child agent owned. The fix is a child receipt block: every child returns the paths it touched, the commands it ran, and the tests proving its regression guards. Parents stop green-lighting diffs they cannot see.

Review-queue theater. CI is green and reviewers still ask "why this approach?" with no written answer. The fix is a decision stub: the PR template forces three lines covering constraints considered, rejected alternatives, and verification proof. The debate moves from vibes to written tradeoffs.

Here is a boundary snapshot that covers all three tools, including Claude Code, Anthropic's coding agent. Paste it, then adapt the globs to your repo.

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

The wider playbook lives under agentic coding governance, and the connector half of this argument is in MCP for team workflows.

Check the merge against four questions

When an agent's PR lands, a reviewer should be able to pass or fail it on four questions. Print this and put it next to the merge button.

Gate Question
Reviewer path Can someone unfamiliar trace intent without chat replay?
Risk routing Were red folders touched, and who approved?
Replay proof Which commands prove the regression guards?
Receipt match Does the PR body list scopes plus the verification transcript?

If any answer is missing, the gate has done its job: the diff goes back before it goes in.

Common questions

  • What rules do Codex workspace agents need before they run?

    Codex workspace agents need AGENTS.md to carry replay-friendly verification notes: an intent line, the command transcript, and a diff summary before the PR opens. Without that block, CLI convenience hides verification theater, where the commands ran but the narrative never reached the reviewer. The replay sandwich makes the run reproducible from the PR alone.

  • What is the best practice for Cursor cloud agents?

    Keep scopes explicit in .mdc files and forbid undeclared MCP domains, so a reviewer can compare the declared boundary against the actual diff. The gate asks one question: can someone unfamiliar trace intent without replaying the chat? Risk routing matters too, since any red folder a cloud agent touches needs a named approver.

  • How do teams keep MCP connectors from touching unlisted data?

    Write one connector card per MCP server listing allowed actions, forbidden actions, owner, and rollback. Connectors default to capability demos, so least privilege needs an explicit trust boundary written down somewhere the team reads. Once the card exists, incidents shrink because operators finally know what "off" looks like for that server.

  • Why do cloud agents create silent rework?

    Cloud agents create silent rework because delegation without boundaries moves the bottleneck from typing speed to traceability. The agent finishes its task and hands you a diff you cannot trace. Smaller tasks do not fix it; scopes have to live in plain, boring files before reviewers can trust the output at all.

Start with one rule this week

Pick the failure mode that bit you last and turn its fix into a real file: a .mdc scope, an AGENTS.md note, or a connector card before the next unattended run. If overnight agent PRs already land faster than your reviewers can own them, contact us and we will help you write workspace rules against your real repos, not a demo.

Further reading

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch