How to Automate Testing for Cursor AI-Generated Code

Cursor is the fastest way to write code with AI. But there's a gap between "the code compiles" and "the code works." Every Cursor user knows the cycle: ask the agent to implement a feature, manually test it, find 5 bugs, paste them back, fix 3, introduce 2 new ones, test again.

CodeLoop closes this gap by automating the entire verification loop inside Cursor.

What you get

After a one-time setup, your Cursor agent will automatically:

Run codeloop_verify after each implementation — build, lint, test, and screenshots in one call

Call codeloop_diagnose when failures occur — categorized repair tasks, prioritized by severity

Fix the issues using the structured repair tasks

Check the gate with codeloop_gate_check — pass/fail at 94% confidence

Loop until done — up to 15 iterations without human intervention

Setup (under 2 minutes)

Step 1: Get your API key

Add to your shell profile (~/.zshrc or ~/.bashrc)

export CODELOOP_API_KEY="cl_live_your_key_here"

Step 2: Initialize in your project

cd your-project

npx codeloop init

This creates the MCP config at .cursor/mcp.json and sets up agent rules that tell Cursor when and how to call CodeLoop tools.

Step 3: Enable Auto-Run mode

By default, Cursor prompts you to approve every terminal command. To let the verification loop run uninterrupted:

Open Settings: Cmd+Shift+J (Mac) or Ctrl+Shift+J (Windows/Linux)

Go to Features > Terminal

Set Auto-Run Mode to "Yolo" (runs everything) or "Auto-Run with Allowlist" (safer)

Step 4 (optional): Global activation

Want CodeLoop active in every future project without running init again?

npx codeloop init --global

This registers the MCP server globally in ~/.cursor/mcp.json so CodeLoop tools are available in every workspace.

What the loop looks like in practice

You ask Cursor: *"Implement the login screen with email/password validation."*

The agent writes the code, then automatically calls codeloop_verify. The output looks like:

{

"status": "fail",

"build": { "passed": true },

"tests": { "passed": 8, "failed": 2 },

"confidence": 0.72

}

The agent calls codeloop_diagnose, gets repair tasks, fixes the two failures, and calls codeloop_verify again. This time: 10/10 tests pass, confidence 0.94, gate passes. Done — without you touching anything.

Design comparison with Figma

If you have Figma designs, CodeLoop can compare your coded UI against them:

Export your Figma frames to designs/ or configure .codeloop/figma.json with your Figma API token

The agent calls codeloop_design_compare to pixel-diff across viewports

A blocker gate (design_compare_evidence) prevents shipping until the match score meets the threshold

This is particularly powerful for UI-heavy projects where "it works" isn't enough — it also needs to *look right*.

Video recording and interaction testing

For interactive apps, CodeLoop goes beyond screenshots:

codeloop_start_recording begins a window-scoped video recording

codeloop_interact performs real UI actions — click, type, swipe, scroll

codeloop_stop_recording finalizes the video

codeloop_interaction_replay extracts key frames for visual verification

The video is motion-validated — static recordings (where the app didn't actually respond) are automatically rejected by the gate.

Tips for best results

Use test filters for focused verification: the test_filter parameter lets you run only relevant tests

Start with the verify-fix loop, then add visual review and design comparison as your project matures

Let the agent iterate — the rules enforce up to 15 fix attempts before escalating to you

Check the development log — codeloop_generate_dev_report creates a structured evidence trail of every run

Pricing

CodeLoop is $5/mo for solo developers. The 14-day trial gives you the full Team-tier allowance — unlimited verifications, 5,000 visual reviews, 2,000 design comparisons. No credit card required.

Start your free trial → | Read the docs →