How do coding agent loops work in a legacy codebase?

Narrowly, and that is the point. Coding agent loops handle messy code best when the scope ledger forbids more than it allows: one goal, explicit allowed paths, a pinned verification command, a named merge owner. The loop iterates inside that fence, and every pass produces a transcript a reviewer can replay later.

What stops an agent loop from drifting out of scope?

Files, not vigilance. The scope ledger names allowed and forbidden paths, CLAUDE.md states which hooks win, and the diff gets checked against the ledger at review time. Drift still happens. The difference is that it becomes visible at the gate instead of in production.

Which verification command belongs in the loop?

The one that proves your regression guards, pinned by name in the scope ledger. If the team cannot agree on that command, the loop is not ready to run. An agent iterating against a vague check will optimize the check instead of the code, and you will not notice until later.

What is the first artifact a team should write?

The scope ledger. Five lines per loop, written before the first iteration and checked against the diff after the last one. It costs about a minute and removes most of the "why is this here?" review friction, which is where messy-code loops usually go sideways.

Coding agent loops for messy code

A coding agent loop holds up in messy code when every pass leaves evidence a reviewer can read later. A coding agent loop is a cycle where an agent edits, runs a verification command, reads the result, and tries again until the check passes. The loop is fine. What breaks is the part you cannot see: the command that ran in someone's terminal and never made it into the PR.

Cursor, Anysphere's AI code editor, gives you the machinery for that loop. It does not give you the fences. You add those yourself, in boring files, before the first iteration runs. The rest of this piece is about which files, and what to write in them.

Write a five-line scope ledger before the loop runs

The fastest way to keep a loop honest is to tell it what it may not touch. Put a five-line ledger in the parent chat: goal, allowed paths, forbidden paths, verification command, merge owner. That last command stops being folklore and becomes a named artifact you can pin to the PR.

The forbidding matters more than the allowing. A loop pointed at a legacy module will happily wander into adjacent files, and "wait, why is this file in the diff?" is the comment you are trying to avoid. The Cursor agent docs cover how the loop edits and reruns, but the boundary is yours to draw.

Here is a snapshot you can drop in as a starting boundary, then adapt the globs to your repo:

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

Put a precedence clause at the top of CLAUDE.md

Approving the same bash command twenty times turns into approving it on reflex. The fix is not more discipline. It is a precedence clause at the top of CLAUDE.md that states which hooks win, which folders need a human, and where temporary overrides live.

Once that is written down, the loop stops renegotiating policy on every pass. Claude Code, Anthropic's coding agent, reads the file and behaves the same way on iteration one and iteration fifty. The Claude Code getting started guide wires the mechanism; the clause is the decision you make on top of it.

Make Codex leave a replay receipt

Greens get merged where reviewers never saw the transcript. The command passed, the narrative stayed in the agent's head, and the diff arrived without proof. For Codex, OpenAI's coding agent, the answer is a replay receipt: have AGENTS.md require an intent line, then the command transcript, then a diff summary, in that order, before any PR.

That turns each iteration into something reproducible after the fact. A reviewer can read the intent, rerun the command, and check the diff against both. It is the difference between trusting the agent and being able to verify it.

Give every MCP connector its own card

A connector wired for one loop quietly serves every loop after it, reaching data nobody listed. Write one markdown card per Model Context Protocol server: allowed actions, forbidden actions, owner, rollback. Now an operator knows what "off" looks like before they need it.

The Model Context Protocol specification defines what a server can do. It stays quiet about what your server should do, which is exactly the question a connector card answers.

In our methodology the loop's heartbeat is the Test step: the pinned verification command is what separates iteration from thrashing. For the wider operating contract, see agentic coding governance, and for the unattended version of this loop, scripts and CI with nobody watching, see headless agent runs in CI.

Review the loop's output line by line

Read what the loop produced the way you would read a contractor's invoice: line by line, against the quote. Four gates catch most of what slips:

Gate	Question
Replay proof	Which commands prove regression guards?
Receipt match	Does the PR body list scopes + verification transcript?
Rules precedence	Which `.mdc`, `SKILL.md`, or `CLAUDE.md` governed behavior?
Connector truth	Which MCP servers fired, and were they expected?

A short checklist for the same review pass:

Red-folder paths received explicit human acknowledgement.
Scopes in the PR body match folders in the diff.
Primary-doc links were smoke-checked after publishing edits.
MCP connectors mentioned (if any) list owners.

A repo that cannot state its boundaries plainly will watch the loop guess, and guessing compounds faster than it converges. The NIST AI Risk Management Framework would call the fix governance. In practice it is five lines in a ledger and a human standing outside the trench.

Common questions

How do coding agent loops work in a legacy codebase? Narrowly, and that is the point. Coding agent loops handle messy code best when the scope ledger forbids more than it allows: one goal, explicit allowed paths, a pinned verification command, a named merge owner. The loop iterates inside that fence, and every pass produces a transcript a reviewer can replay later.
What stops an agent loop from drifting out of scope? Files, not vigilance. The scope ledger names allowed and forbidden paths, CLAUDE.md states which hooks win, and the diff gets checked against the ledger at review time. Drift still happens. The difference is that it becomes visible at the gate instead of in production.
Which verification command belongs in the loop? The one that proves your regression guards, pinned by name in the scope ledger. If the team cannot agree on that command, the loop is not ready to run. An agent iterating against a vague check will optimize the check instead of the code, and you will not notice until later.
What is the first artifact a team should write? The scope ledger. Five lines per loop, written before the first iteration and checked against the diff after the last one. It costs about a minute and removes most of the "why is this here?" review friction, which is where messy-code loops usually go sideways.

Start with one module

Pick your messiest module, write the five-line ledger for it, and run two loops on it: one with a pinned verification command and one without. Count the review minutes, then standardize the winner, and bring that pattern to your next training session.

Coding agent loops for messy code

Write a five-line scope ledger before the loop runs

Put a precedence clause at the top of CLAUDE.md

Make Codex leave a replay receipt

Give every MCP connector its own card

Review the loop's output line by line

Common questions

Start with one module

Related training topics

Related research

How to clean up agent-written code

Cursor Composer layers in agentic coding

AI coding tools that last past the demo

Ready to start?

Write a five-line scope ledger before the loop runs

Put a precedence clause at the top of CLAUDE.md

Make Codex leave a replay receipt

Give every MCP connector its own card

Review the loop's output line by line

Common questions

Start with one module

Related training topics

Cursor subagents and team skills for engineering teams

Cursor rules training for engineering teams

Cursor MCP training for engineering teams

AI code review habits for generated code

Related research

How to clean up agent-written code

Cursor Composer layers in agentic coding

AI coding tools that last past the demo

Ready to start?