AI coding tools that keep working after

AI coding tools keep working after rollout when the repo carries the contract, not the person who set it up. AI coding tools are agent-capable editors and CLIs, like Cursor, Anysphere's AI code editor, Claude Code, and Codex CLI, that draft, verify, and apply code changes. The ones that stay useful past the launch demo are rarely the cleverest. They are the ones whose operating rules live in files a stranger can read.

Here is the pattern I see in office hours. A tool wows the room during rollout. Three sprints later it is eating the budget, because someone forked a workflow with no receipts and burned an afternoon before lint even ran. The model did not get worse. The contract was never written down, so it left when its author did.

Why the demo survives but the workflow does not

Survivorship bias runs most tool conversations. Teams study the setups that still fly, copy their plugins, and copy the wrong thing.

The plugins are visible. The real difference is not. Durable setups carry written precedence, replay receipts, and connector boundaries inside the repo. The tool keeps working because its behavior is predictable from files alone, not from one operator staying in the room.

So the test is simple. Can a teammate who joined yesterday predict what your agent will do, and find its off switch, by reading the repo? If not, the agent will guess, and guessing scales badly.

Write the four contracts your repo is missing

Four failure modes show up again and again. Each has a small, file-backed fix you can paste today.

Claude permission creep. Bash approvals turn into muscle memory. Each one looks fine alone, and together they form a permission surface nobody signed. Put a supremacy clause at the top of CLAUDE.md: which hooks win, which folders need human eyes, and where temporary overrides live. Sessions stop inventing policy mid-run.

Codex replay gaps. Teams on Codex CLI merge green runs whose transcripts never reached a reviewer. The commands ran, but the proof stayed in one engineer's terminal. Fix it with a replay sandwich in AGENTS.md: an intent line, the command transcript, and a diff summary before the PR. The run becomes team-owned work, not private history.

MCP blast radius. A connector wired up in five minutes ends up touching data nobody put on the diagram. Write one connector card per MCP server: allowed actions, forbidden actions, a named owner, and a rollback path. Incidents shrink because everyone knows what off looks like.

Recursive handoff blur. Chained agents return tidy summaries that hide the paths a child actually changed, and the missing contract gets blamed on the tool. Require a child receipt block: every child returns the paths it touched, the commands it ran, and the tests that prove the regression guards held.

Here is a starter rule you can drop into a Cursor repo and adapt:

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

The agentic coding governance work that grows out of these fixes lives on our agentic coding governance topic page, and a better bug-finding prompt for coding agents applies the same receipt discipline to debugging.

Put the tool on trial at review

When a tool is on trial, reviewers do not argue about the model. They check whether the evidence is in the PR. Use this short gate before any agent-authored merge touches shared surfaces:

Gate	Question
Risk routing	Were red folders touched, and who approved?
Replay proof	Which commands prove the regression guards?
Receipt match	Does the PR body list scopes plus a verification transcript?
Rules precedence	Which `.mdc`, `SKILL.md`, or `CLAUDE.md` governed behavior?

In our methodology this is the line between Build and Deploy: evidence beats narration once a merge can affect someone else's work. If your repo cannot state its boundaries plainly, a newer model will not save you. It will just guess faster.

Common questions

Which AI coding tools keep working after the demo phase? The ones whose operating contract lives in the repo. That means precedence in CLAUDE.md, scope in .mdc, replay notes in AGENTS.md, and a card per connector. Tool choice matters less than whether a stranger can predict the agent's behavior from those files. If the answer is no, any tool will drift.

Why do bash approvals become a problem over time? Because they become muscle memory. Each approval is reasonable on its own, but they accumulate into a permission surface nobody reviewed as a whole. A supremacy clause at the top of CLAUDE.md restores written precedence, so each session reads the same policy instead of negotiating one from scratch mid-run.

What makes an MCP connector safe to keep? A connector card: allowed actions, forbidden actions, a named owner, and a rollback path. Connectors ship as capability demos, so least privilege has to be written down by hand. If nobody on the team can describe what off looks like, the connector is not safe. It is just quiet until it is not.

Where should I start if I only have an hour? Pick the failure mode that bit you most recently and write its one fix. Turn a named fix into a .mdc rule, an AGENTS.md note, a child receipt requirement, or a review checklist line. One written contract beats four planned ones, and you can add the rest next sprint.

Start with one contract

Choose the single fix that maps to your last painful merge and write it into the repo this week. The white paper lays out the full repo-carried contract, file by file: read the white paper.

AI coding tools that keep working after rollout

Why the demo survives but the workflow does not

Write the four contracts your repo is missing

Put the tool on trial at review

Common questions

Further reading

Start with one contract

Related training topics

Related research

Cursor Composer layers in agentic coding

AI coding tools that last past the demo

AI coding wrappers that hold up under review

Ready to start?

Why the demo survives but the workflow does not

Write the four contracts your repo is missing

Put the tool on trial at review

Common questions

Further reading

Start with one contract

Related training topics

Cursor subagents and team skills for engineering teams

Cursor rules training for engineering teams

Cursor MCP training for engineering teams

AI code review habits for generated code

Related research

Cursor Composer layers in agentic coding

AI coding tools that last past the demo

AI coding wrappers that hold up under review

Ready to start?