What are browser checks for coding agents?

They are verification steps a reviewer can replay to confirm an agent's UI claim. The intent, the command transcript, and the diff summary land in the PR instead of staying in chat. The goal is reproducible review, so a teammate can confirm the behavior without standing behind the operator's terminal or re-running the whole session by hand.

Why do agent UI changes pass CI and still fail review?

Because CI proves the code ran, not that the claimed behavior exists or that anyone can explain the approach. That is review-queue theater: a green check plus an unanswered "why." A decision stub with the constraints considered, the alternatives rejected, and the verification proof closes the gap in three short lines.

What belongs in a child receipt when agents are chained?

Paths touched, commands run, and the tests that prove regression guards. Summaries alone create telephone-game risk, where each handoff quietly loses the detail review depends on. With receipts, the parent agent and the human reviewer check the same evidence instead of trusting a compressed retelling of what happened downstream.

Which receipt should we add first?

Start with the replay sandwich in AGENTS.md, since it ships intent, transcript, and diff summary by default on every CLI run. It needs no new tooling and changes the next PR you open. Add the connector card next if any agent touches MCP servers, then the decision stub once the habit sticks.

Browser checks for coding agents reviewers

A browser check is a verification step a reviewer can replay to confirm an agent's UI claim, instead of trusting the chat narrative. When Cursor, Anysphere's AI code editor, tells you "the flow works," that sentence is a promise, not proof. The fix is small: make the agent leave receipts a teammate can open later, so the green check carries evidence with it.

This matters most at the merge queue. The diff looks clean, CI is green, and the reviewer signing off tonight never opened a browser. A receipt closes that gap without slowing anyone down.

Why agent UI changes pass CI and still fail review

CI proves the code ran. It does not prove the claimed behavior exists, and it never proves anyone could explain the approach. Those are two different kinds of trust, and only one of them shows up as a checkmark.

The trap is survivorship bias. The runs you see are the merged greens. The transcripts that fell apart and never reached a PR are invisible, so review optimizes around the cases that already worked. You end up tuning for the planes that came back.

Receipts fix this by moving evidence out of chat and into the PR, where a reviewer who was not in the room can still follow along. Make traceability easy first. Generation is already easy.

Four receipts that replace blind trust

Each receipt targets one failure mode, and each is small enough to actually keep. Add them to the prompts and templates your team already uses.

A replay sandwich handles the Codex gap, where a command ran but nobody saw the transcript. Have AGENTS.md require an intent line, then a command transcript, then a diff summary before the PR opens. The Codex quickstart sets up the CLI but leaves this convention to you. Now review is reproducible without standing behind someone's terminal.

A connector card handles MCP blast radius, where a connector touches data nobody put on the diagram. Write one markdown card per server listing allowed actions, forbidden actions, the owner, and the rollback. The MCP specification is worth reading before you wire one. When operators know what "off" looks like, incidents shrink.

A child receipt block handles recursive handoff blur, where chained agents return tidy summaries that drop the paths a child actually touched. Have every child return the paths it changed, the commands it ran, and the tests that guard regressions. Parents stop green-lighting mystery diffs.

A decision stub handles review-queue theater: green CI plus a reviewer asking "why this approach?" with no written answer. The PR template asks for three lines, the constraints considered, the alternatives rejected, and the verification proof. The browser evidence finally has a fixed place to land.

Set delegation boundaries each agent can read

Drop a snapshot like this into your repo so each tool knows its own scope. Adapt the globs to your layout.

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Cursor: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

This routes through our methodology at the review step: parallel agent output must be inspectable without replaying a session. Teams that automate the browser half usually pair it with Playwright MCP loops, and the wider operating standard lives under AI coding governance.

Run the reviewer's gate

Before you approve, walk these four questions. They take a minute and catch the diffs that look fine but cannot be traced.

Gate	Question
Rules precedence	Which `.mdc`, `SKILL.md`, or `CLAUDE.md` governed behavior?
Connector truth	Which MCP servers fired, and were they expected?
Reviewer path	Can someone unfamiliar trace intent without chat replay?
Risk routing	Were red folders touched, and who approved?

Then run this checklist on the PR itself:

Red-folder paths received explicit human acknowledgement.
Scopes in the PR body match folders in the diff.
Primary-doc links were smoke-checked after publishing edits.
MCP connectors mentioned, if any, list owners.

If your repo cannot state its boundaries plainly, agents will guess, and guessing scales poorly. The agent is the relief crew; the blueprint still belongs to the humans standing outside the trench.

Common questions

What are browser checks for coding agents?

They are verification steps a reviewer can replay to confirm an agent's UI claim. The intent, the command transcript, and the diff summary land in the PR instead of staying in chat. The goal is reproducible review, so a teammate can confirm the behavior without standing behind the operator's terminal or re-running the whole session by hand.
Why do agent UI changes pass CI and still fail review?

Because CI proves the code ran, not that the claimed behavior exists or that anyone can explain the approach. That is review-queue theater: a green check plus an unanswered "why." A decision stub with the constraints considered, the alternatives rejected, and the verification proof closes the gap in three short lines.
What belongs in a child receipt when agents are chained?

Paths touched, commands run, and the tests that prove regression guards. Summaries alone create telephone-game risk, where each handoff quietly loses the detail review depends on. With receipts, the parent agent and the human reviewer check the same evidence instead of trusting a compressed retelling of what happened downstream.
Which receipt should we add first?

Start with the replay sandwich in AGENTS.md, since it ships intent, transcript, and diff summary by default on every CLI run. It needs no new tooling and changes the next PR you open. Add the connector card next if any agent touches MCP servers, then the decision stub once the habit sticks.

Where to go next

We rehearse these receipts against live repos in our training sessions; bring one agent workflow and leave with the gate your reviewers will keep.

Browser checks for coding agents that reviewers can replay

Why agent UI changes pass CI and still fail review

Four receipts that replace blind trust

Set delegation boundaries each agent can read

Run the reviewer's gate

Common questions

Where to go next

Further reading

Related training topics

Related research

How to clean up agent-written code

Cursor Composer layers in agentic coding

Codex workspace agents need repo rules

Ready to start?

Why agent UI changes pass CI and still fail review

Four receipts that replace blind trust

Set delegation boundaries each agent can read

Run the reviewer's gate

Common questions

Where to go next

Further reading

Related training topics

Cursor subagents and team skills for engineering teams

Cursor rules training for engineering teams

Cursor MCP training for engineering teams

AI code review habits for generated code

Related research

How to clean up agent-written code

Cursor Composer layers in agentic coding

Codex workspace agents need repo rules

Ready to start?