Automated QA for Claude Code Workflows

Claude Code is Anthropic's terminal-based AI coding agent. It writes, edits, and runs code directly from your command line. But like all AI agents, it needs a verification layer — something that checks whether the code it wrote actually works before it moves on.

CodeLoop is that layer. It runs as an MCP server that Claude Code calls natively, automating the verify-diagnose-fix loop until your code reaches high confidence.

Why Claude Code + CodeLoop works well

Claude Code already supports MCP (Model Context Protocol) natively. This means CodeLoop tools appear as first-class tools that Claude can call directly — no plugins, no wrappers, no browser extensions.

The integration is particularly clean because:

CLAUDE.md rules tell the agent exactly when to call CodeLoop and how to handle failures

Always-on activation means every new project auto-triggers CodeLoop after a one-time global install

Permissions are pre-configured — codeloop init sets up permissions.allow so Claude can run build/test commands without manual approval

Setup

Step 1: Get your API key

Sign up (or use the web signup at codeloop.tech/signup)

npx codeloop signup

Your API key is saved automatically

Step 2: Initialize in your project

cd your-project

npx codeloop init

This creates .claude/settings.local.json (MCP config), .claude/AGENTS.md (agent rules), and .codeloop/config.json (project settings).

Step 3: Global activation (recommended)

npx codeloop init --global

This is the key step for Claude Code users. It:

- Registers the MCP server in ~/.claude.json (global MCP config)

- Merges CodeLoop instructions into ~/.claude/CLAUDE.md (global agent memory)

After this, every new project you open with Claude Code will have CodeLoop tools available and the agent will know to use them. No per-project setup needed.

The autonomous loop

Once configured, Claude Code follows this pattern for every task:

Implement the feature as requested

Call codeloop_verify — runs build, lint, tests, captures screenshots

If failures: call codeloop_diagnose → get repair tasks → fix → verify again

Repeat until confidence reaches 94%

Call codeloop_gate_check — enforces build, tests, screenshots, video evidence, and design match

Move to next section if using multi-section orchestration

The CLAUDE.md rules explicitly enforce this: *"When gate returns continue_fixing, you MUST loop back to verify without asking the user. Max 15 iterations before escalation."*

Multi-section orchestration

For larger projects, CodeLoop manages an entire app build section-by-section:

You: "Build a task management app with auth, dashboard, and settings"

Claude Code + CodeLoop will:

Break this into 3 sections with a dependency graph

Implement section 1 (auth) → verify → fix → gate check → pass

Run codeloop_integration_check to ensure auth didn't break anything

Move to section 2 (dashboard) → same loop

Continue until all sections pass at 94%+ confidence

The agent works autonomously through the entire app without waiting for your input after each section. You come back to a fully built, verified application with a structured development log as evidence.

Interaction testing with Claude Code

Claude Code can leverage the full interaction testing suite:

Agent calls: codeloop_start_recording({ app_name: "MyApp" })

Agent calls: codeloop_interact({ action: "click", x: 200, y: 300 })

Agent calls: codeloop_interact({ action: "type", text: "user@example.com" })

Agent calls: codeloop_interact({ action: "keystroke", key: "enter" })

Agent calls: codeloop_stop_recording({ recording_id: "rec_..." })

Agent calls: codeloop_interaction_replay({ expected_flow: "Login with email..." })

The video is motion-validated and key frames are returned as images for Claude's vision model to verify — at zero additional cost (your Claude subscription's vision is used, not a separate API).

Design verification

If your project has Figma designs:

Place design references in designs/ or configure .codeloop/figma.json

Claude calls codeloop_design_compare to pixel-diff the coded UI against the design

The design_compare_evidence gate blocks completion until all viewports match

What makes this different from running tests manually

You could tell Claude Code to run npm test yourself. The difference is:

Structured results: CodeLoop returns typed JSON, not raw terminal output the agent has to parse

Diagnosis: failures are categorized (bug, flaky test, config error) with prioritized repair tasks

Gates: quantified pass/fail at 94% confidence, not "looks like it passed"

Visual evidence: screenshots and video prove the app works, not just that tests pass

Persistence: every run is stored in artifacts/ with full lineage tracking

Pricing

$5/mo for solo developers. The 14-day free trial includes the full Team-tier allowance. No credit card required.

Start your free trial → | Read the docs →