Earn 14 free days when your bug report or suggestion is accepted — how it works
Back to blog

Automated QA for Claude Code Workflows

CodeLoop TeamApril 23, 20266 min read

Automated QA for Claude Code Workflows

Claude Code is Anthropic's terminal-based AI coding agent. It writes, edits, and runs code directly from your command line. But like all AI agents, it needs a verification layer — something that checks whether the code it wrote actually works before it moves on.

CodeLoop is that layer. It runs as an MCP server that Claude Code calls natively, automating the verify-diagnose-fix loop until your code reaches high confidence.

Why Claude Code + CodeLoop works well

Claude Code already supports MCP (Model Context Protocol) natively. This means CodeLoop tools appear as first-class tools that Claude can call directly — no plugins, no wrappers, no browser extensions.

The integration is particularly clean because:

  • CLAUDE.md rules tell the agent exactly when to call CodeLoop and how to handle failures
  • Always-on activation means every new project auto-triggers CodeLoop after a one-time global install
  • Permissions are pre-configuredcodeloop init sets up permissions.allow so Claude can run build/test commands without manual approval
  • Setup

    Step 1: Get your API key

    Sign up (or use the web signup at codeloop.tech/signup)

    npx codeloop signup

    Your API key is saved automatically

    Step 2: Initialize in your project

    cd your-project

    npx codeloop init

    This creates .claude/settings.local.json (MCP config), .claude/AGENTS.md (agent rules), and .codeloop/config.json (project settings).

    Step 3: Global activation (recommended)

    npx codeloop init --global

    This is the key step for Claude Code users. It:

    - Registers the MCP server in ~/.claude.json (global MCP config)

    - Merges CodeLoop instructions into ~/.claude/CLAUDE.md (global agent memory)

    After this, every new project you open with Claude Code will have CodeLoop tools available and the agent will know to use them. No per-project setup needed.

    The autonomous loop

    Once configured, Claude Code follows this pattern for every task:

  • Implement the feature as requested
  • Call codeloop_verify — runs build, lint, tests, captures screenshots
  • If failures: call codeloop_diagnose → get repair tasks → fix → verify again
  • Repeat until confidence reaches 94%
  • Call codeloop_gate_check — enforces build, tests, screenshots, video evidence, and design match
  • Move to next section if using multi-section orchestration
  • The CLAUDE.md rules explicitly enforce this: *"When gate returns continue_fixing, you MUST loop back to verify without asking the user. Max 15 iterations before escalation."*

    Multi-section orchestration

    For larger projects, CodeLoop manages an entire app build section-by-section:

    You: "Build a task management app with auth, dashboard, and settings"

    Claude Code + CodeLoop will:

  • Break this into 3 sections with a dependency graph
  • Implement section 1 (auth) → verify → fix → gate check → pass
  • Run codeloop_integration_check to ensure auth didn't break anything
  • Move to section 2 (dashboard) → same loop
  • Continue until all sections pass at 94%+ confidence
  • The agent works autonomously through the entire app without waiting for your input after each section. You come back to a fully built, verified application with a structured development log as evidence.

    Interaction testing with Claude Code

    Claude Code can leverage the full interaction testing suite:

    Agent calls: codeloop_start_recording({ app_name: "MyApp" })

    Agent calls: codeloop_interact({ action: "click", x: 200, y: 300 })

    Agent calls: codeloop_interact({ action: "type", text: "user@example.com" })

    Agent calls: codeloop_interact({ action: "keystroke", key: "enter" })

    Agent calls: codeloop_stop_recording({ recording_id: "rec_..." })

    Agent calls: codeloop_interaction_replay({ expected_flow: "Login with email..." })

    The video is motion-validated and key frames are returned as images for Claude's vision model to verify — at zero additional cost (your Claude subscription's vision is used, not a separate API).

    Design verification

    If your project has Figma designs:

  • Place design references in designs/ or configure .codeloop/figma.json
  • Claude calls codeloop_design_compare to pixel-diff the coded UI against the design
  • The design_compare_evidence gate blocks completion until all viewports match
  • What makes this different from running tests manually

    You could tell Claude Code to run npm test yourself. The difference is:

  • Structured results: CodeLoop returns typed JSON, not raw terminal output the agent has to parse
  • Diagnosis: failures are categorized (bug, flaky test, config error) with prioritized repair tasks
  • Gates: quantified pass/fail at 94% confidence, not "looks like it passed"
  • Visual evidence: screenshots and video prove the app works, not just that tests pass
  • Persistence: every run is stored in artifacts/ with full lineage tracking
  • Pricing

    $5/mo for solo developers. The 14-day free trial includes the full Team-tier allowance. No credit card required.

    Start your free trial → | Read the docs →