Core concepts
You don't need to read this to use CodeLoop. After npx codeloop init your AI agent already runs this loop on its own. This page is here for the curious, for debugging, and for whoever has to explain CodeLoop in a meeting. Skip to the Quick Start if you just want it working.
CodeLoop has a small vocabulary you will see across the docs, the MCP tools, the dashboard, and the GitHub Action. Five minutes here saves you an hour later — but only if you actually want to know.
The loop
Every CodeLoop interaction follows the same three-step shape:
verify → diagnose → gate-check
↑ ↓
└───── repair until green ─┘- Verify runs your existing build, lint, tests, and (for UI projects) screenshots / recordings. It does not interpret anything — it produces facts.
- Diagnose reads the verify output, classifies each failure by severity (
blocker,critical,warning,info), pins the root cause to specific files, and emits a concrete repair task list for the agent. - Gate-check compares the run against your spec and acceptance checklist and returns a single confidence score plus a recommendation:
ready_for_review,continue_fixing, orblocked.
The agent fixes whatever diagnose flagged, calls verify again, and only stops when gate-check says ready_for_review with confidence above your threshold (default 94 %).
Run
A run is one invocation of codeloop_verify. Every run gets a stable run_id (e.g. run_1777327112458_mlpm3a) that flows through the rest of the loop — you pass it to diagnose, gate_check, release_readiness, and the dashboard URL.
Runs are immutable. Re-running verify produces a newrun_id; nothing is overwritten in place. This is what makes the dashboard timeline meaningful and what lets you compare last week's confidence to today's.
Artifact
Every run writes a RunArtifact directory at artifacts/runs/<run_id>/. The shape is stable:
artifacts/runs/run_1777.../
manifest.json # the structured RunArtifact (top-level)
logs/
build.log # raw stdout/stderr per check
lint.log
test.log
screenshots/
desktop/ # one folder per viewport
home.png
home.metadata.json
mobile/
tablet/
videos/
interaction-001.mp4 # full recording
interaction-001-frames/
frame-001.png # ~15 motion-extracted key frames
frame-002.png
diagnose.json # categorised issues from codeloop_diagnose
gate.json # gate-check evidence + score
trace.jsonl # tool-call trace for the agentEverything the dashboard, the GitHub Action sticky comment, and the Verified by CodeLoopbadge show is computed from this directory — nothing is stored server-side unless you self-host with the API enabled.
Gate
A gate is one of five binary checks that codeloop_gate_check evaluates against a run:
| Gate | What it asks | Severity |
|---|---|---|
build_passes | Did the build / type-check / compile succeed? | blocker |
required_tests_pass | Did every test marked required (and the touched test files) pass? | blocker |
zero_critical_issues | Did diagnose classify any issue as blocker / critical? | blocker |
visual_regression_threshold | Are visual diffs and recorded interactions within tolerance vs. baselines? | warning |
acceptance_criteria_met | Do the passing tests and run evidence cover the items in the spec / acceptance checklist? | blocker |
A run passes the gate when all blocker gates pass and the weighted score crosses the configured threshold. You can tune the threshold per project in .codeloop/config.json; the default of 94 % matches the “ready for human review” bar most teams want.
Confidence score
Confidence is a single 0–100 number that summarises the run's evidence weighted by severity. It is deterministic for a given run — same artifacts, same score — so you can reason about it across CI, the dashboard, and the badge.
- ≥ 94 —
ready_for_review: shippable assuming a human glance. - 70 – 93 —
continue_fixing: some non-blocker checks failed or the run lacks evidence (e.g. UI project with no screenshots). The diagnose output is your repair list. - < 70 —
blocked: blocker gates failed; do not ship.
Spec and acceptance checklist
The two files codeloop_gate_check reads to evaluate acceptance_criteria_met:
- Spec (e.g.
SPEC.mdor your_master.md) — what the project is supposed to do. - Acceptance checklist (e.g.
docs/E2E_TEST_CHECKLIST.md) — the testable claims that must be demonstrably true before you ship.
The pair turns “is this done?” into a query a machine can answer. For multi-section apps you also have per-section specs and per-section checklists — see multi-section orchestration.
Workflow enforcement
codeloop_check_workflow is a small but important guardrail: it fails loudly if the agent tried to declare a task complete withoutrunning the prerequisite tools (verify, screenshots, recordings, gate). It is the reason the loop is hard to accidentally skip — the agent gets back an explicit list of which tools it still owes.
Plugin
A pluginis a JSON entry that maps any CLI test runner into the verify suite. CodeLoop ships first-party runners for Node / Web / Flutter / Xcode / Android / .NET; everything else — Python/Django, Ruby/Rails, Go, custom monorepo scripts — plugs in through .codeloop/plugins.json. Plugin failures count toward the gate score and surface in the dashboard exactly like first-party failures. See Plugin SDK.
Section
For projects too large for one verify cycle, CodeLoop's section model lets the agent build a 10-screen app over many sessions. Each section has its own spec, acceptance criteria, and state (pending / in_progress / completed / blocked). See multi-section orchestration for the full state machine.
Baseline
A baseline is the canonical screenshot for a given screen + viewport combination. codeloop_visual_review diffs every fresh screenshot against the matching baseline; if the pixel difference exceeds your threshold (default 2 %), the gate flags a regression. You promote a current screenshot to a baseline with codeloop_update_baseline once the change is intentional. See Visual review.
Design reference
For Figma-driven workflows, a design reference is a Figma frame (or local PNG under designs/) that the coded UI must match. codeloop_design_compare pixel-diffs the coded UI against the reference across multiple viewports and returns a match score. See Design compare.
Recording & replay
For interactive flows (login, checkout, drag-and-drop) screenshots aren't enough. CodeLoop records video of the running app while the agent calls codeloop_interact, then extracts ~15 motion-validated key frames. The dashboard plays the video alongside the frames and the correlated app log lines so you can prove a real interaction happened. See Recording & replay.
How agents see CodeLoop
From the agent's point of view, CodeLoop is a set of 29 MCP tools. The agent doesn't decide when to call them — the rule files installed by codeloop init tell Cursor / Claude Code:
- After every code change, call
codeloop_verify. - If verify fails, call
codeloop_diagnose, fix the tasks, re-verify. - For UI work, capture screenshots and record interactions.
- Never declare done without
codeloop_gate_checkreturningready_for_review.
The agent uses its own LLM tokens for the coding part. CodeLoop adds zero LLM cost— every verify, diagnose, gate-check, screenshot, and recording is a deterministic local computation.
Where to go next
- Quick Start— install and run the loop in under 2 minutes.
- Tool reference— every tool, every parameter.
- Architecture— how the pieces fit together end-to-end.
- Glossary— quick definitions for every term in the docs.