Earn 14 free days when your bug report or suggestion is accepted — how it works
Back to blog

How to Stop Your AI Coding Agent from Falsely Claiming 'Done'

CodeLoop TeamApril 30, 20265 min read

How to Stop Your AI Coding Agent from Falsely Claiming "Done"

You ask the AI agent to add a feature. It edits five files, says "Done!", you switch to the browser, and the page is blank. The build broke three minutes ago and the agent didn't notice.

This is the most common AI-assisted-development complaint of 2026. It has a clean fix.

The fix: a hard gate the agent can't skip

The pattern is two parts:

  • A gate function that returns either ready_for_review or continue_fixing. The function evaluates a confidence score across build, tests, lint, screenshots, and design diff. Only ≥ 94% confidence returns ready_for_review.
  • A user rule that says: the agent is forbidden from declaring the task done until the gate function returns ready_for_review.
  • The user rule is the critical bit. Without it, the agent doesn't know the gate exists. With it, the agent reliably loops fix → verify until the gate passes.

    Implementation in 60 seconds

    CodeLoop ships this pattern as an MCP server:

    npx codeloop init

    That writes the user rule into both Cursor (~/.cursor/codeloop-user-rule.md to paste into Settings → Rules) and Claude Code (~/.claude/CLAUDE.md, auto-injected).

    The rule reads:

    > After every code change, call codeloop_verify. If it fails, call codeloop_diagnose, fix, then re-verify. Do not declare the task done until codeloop_gate_check returns ready_for_review with confidence ≥ 94%.

    That's it. From the next session onward, the agent loops on its own.

    Why 94%?

    We A/B-tested gate thresholds against ~5000 PRs. Below 90%, false-positive "done" claims still happened (~5%). Above 94%, the agent occasionally got stuck looping on flaky tests. 94% hit the sweet spot — fewer than 0.5% false positives and the loop terminates within 3 fix cycles in 95% of cases.

    You can override per project in .codeloop/config.json.

    What if my project has flaky tests?

    Two paths:

  • Mark them as flaky in .codeloop/config.json — the gate-check counts only deterministic failures.
  • Use the parent_run_id ladder — when the agent retries a flaky test, the gate-check considers the historical pass rate, not just the latest run.
  • Read more

    - How CodeLoop's gate-check works

    - Quick Start

    - The full 29-tool reference

    Frequently asked questions

    How do I stop my AI agent from declaring done before the build works?

    Use a hard gate. CodeLoop's codeloop_gate_check returns ready_for_review only at ≥ 94% confidence; the user rule forbids the agent from claiming done before that. Install with `npx codeloop init`.

    What if my agent ignores the user rule?

    Cursor and Claude Code reliably honour user rules in 2026. If your model is older or smaller, set CODELOOP_ENFORCE=true so codeloop_check_workflow blocks the gate-check call until evidence (screenshots, video, dev report) is captured.

    Does this work with Codex / Aider / Continue?

    Yes. Any MCP-speaking client honours the same user rule via its own rules system. CodeLoop ships templates for Cursor, Claude Code, Codex, and Aider.