Visual review

This runs automatically. After npx codeloop init on a UI project, every codeloop_verifycaptures screenshots, diffs them against baselines, and gates on the pixel threshold — with no configuration. Read on only if you want to tune behaviour or understand a specific result.

Visual review is the part of the loop that catches the bugs your unit tests don't: a button shifted 4 px right, a font that fell back, a dialog that re-flowed on tablet, a colour that drifted off brand. It runs on every codeloop_verify for any UI project and gates the loop on a pixel-difference threshold.

The flow

codeloop_capture_screenshot           (per screen × viewport)
   ↓
artifacts/runs/<run_id>/screenshots/  (current frame)
   ↓
codeloop_visual_review                (diff against baseline)
   ↓
.codeloop/baselines/                   (canonical reference)
   ↓
gate.visual_regression_threshold      (pass / warn / fail)

Capture screenshots

The agent (or the CLI) calls codeloop_capture_screenshot once per screen, once per viewport. Defaults are mobile / tablet / desktop, configurable in .codeloop/config.json:

{
  "screenshots": {
    "enabled": true,
    "tool": "playwright",
    "base_url": "http://localhost:3000",
    "viewports": [
      { "name": "mobile",  "width": 375,  "height": 812  },
      { "name": "tablet",  "width": 768,  "height": 1024 },
      { "name": "desktop", "width": 1440, "height": 900  },
      { "name": "wide",    "width": 1920, "height": 1080 }
    ],
    "wait_for": "networkidle",
    "full_page": true
  }
}

For Flutter and native projects, swap tool: “playwright” for tool: “maestro” (Flutter) or tool: “simctl” / tool: “adb” (iOS / Android). See the Cross-OS runbook for the per-OS capture matrix.

Discover screens automatically

Don't want to enumerate every URL? Call codeloop_discover_screens once and CodeLoop will crawl your app, follow internal links to a configurable depth, and emit the screen list it found. The list is cached at .codeloop/screens.json and re-used by subsequent runs until you re-run discover.

Compare against baselines

codeloop_visual_review diffs every fresh screenshot against the matching baseline at .codeloop/baselines/<screen>/<viewport>.png. The diff uses pixelmatch and produces three artifacts per screen:

Current— what the app rendered this run.
Baseline— the canonical reference.
Diff— red overlay highlighting only the changed pixels.

All three are visible side by side in the local dashboard, sorted worst-to-best by score so you see the biggest regressions first.

The threshold

The default tolerance is 2 % of pixels different (computed across the whole screen, not bounding-box). Change it in .codeloop/config.json:

{
  "visual_review": {
    "threshold": 0.02,
    "baseline_dir": ".codeloop/baselines",
    "ignore_regions": [
      { "screen": "home",    "rect": [0, 0, 1440, 64] },
      { "screen": "profile", "rect": [120, 480, 320, 540] }
    ]
  }
}

ignore_regions is the practical escape hatch for timestamps, ad slots, A/B test variants, or anything else that legitimately changes between runs.

The gate

Visual review feeds into the visual_regression_threshold gate in codeloop_gate_check. By default it is a warning-severity gate — it lowers your confidence score but does not block the gate from passing on its own. Promote it to a blocker if you want pixel regressions to stop the loop:

{
  "gate_check": {
    "min_confidence": 0.94,
    "visual_severity": "blocker"
  }
}

Promoting an intentional change

When the diff is correct (you really did move that button), promote the current screenshot to the new baseline. Three ways to do it:

From the agent

Tell your agent “baseline the new home page”. It calls codeloop_update_baseline which copies artifacts/runs/<run_id>/screenshots/... into .codeloop/baselines/....

From the CLI

# accept everything from the latest run
npx codeloop baseline update

# promote one screen + viewport
npx codeloop baseline update --screen home --viewport desktop

# accept by run id (older run)
npx codeloop baseline update --run run_177xxxx

From the dashboard

Click Promote to baseline on the diff card. The dashboard writes the file change and prompts you to commit it.

Visual attribution

For complex pages where a small change has unclear cause, codeloop_visual_attribution drills into the diff and tries to attribute the change to a CSS class, a component name, or a recent commit using the touched file list from the verify run. This is the bridge between “something regressed” and a concrete repair task.

Working with baselines in git

Commit baselines. They are the source of truth for what your UI is supposed to look like.
Use Git LFS for big projects.A 10-screen × 4-viewport baseline is ~80 PNGs — usually fine, but if you have per-locale variants, LFS keeps the diff fast.
Review baseline updates in PR.The dashboard's “Promote to baseline” button writes one PNG per promotion — small, reviewable changes that compose well in code review.

CI and the GitHub Action

The CodeLoop Verify GitHub Action runs visual review on every PR. The sticky comment surfaces the worst regressions inline and links into the dashboard for the full diff. The Verified by CodeLoop badge encodes the visual score as part of the overall confidence number.

Common gotchas

Fonts not loaded. Set wait_for: “networkidle” or wait for document.fonts.ready in your test harness.
Animations / video. CodeLoop pauses CSS animations before capture. If you have a custom canvas, expose a data-animation-ready hook the screenshot tool can wait for.
Small AA differences. Use the thresholdfield; 1–2 % is usually the right ballpark for sub-pixel anti-alias drift between OSes.
No baselines yet. The first verify run after init captures screenshots but has nothing to diff. Either commit them as baselines explicitly, or rely on the visual_regression_threshold gate's “no baselines” warning.

Core concepts — baseline
Design compare— Figma-driven equivalent.
Recording & replay— for interactive flows beyond static screens.
Tool reference