Design compare (Figma)

One-time setup, then automatic. Drop a designs/ folder in your repo (or paste a Figma file key into .codeloop/figma.json) and CodeLoop runs design compare on every verify, with no further input. Read this page when you want to wire it up the first time, or when you need to interpret a low match score.

Design compare is the gate that closes the loop with your designer. For every screen × viewport combination, CodeLoop fetches the canonical Figma frame (or a local PNG reference), pixel-diffs it against the coded UI, and returns a match score the gate can act on. It is the QA layer that keeps a Figma file and a shipped product actually aligned.

Two modes

Figma mode— designs live in a Figma file. You give CodeLoop a file key and a frame map; it fetches the frames over the Figma REST API at run time.
Local mode — designs live on disk under designs/<screen>/<viewport>.png. No credentials, no network. Useful for non-Figma teams or air-gapped builds.

Both modes feed into the same codeloop_design_compare tool and the same gate.

Figma mode

1. Get a Figma personal access token

Open Figma » Settings » Personal access tokens.
Click Create new token, give it a name (e.g. codeloop), copy it.
Export the token in your shell (and your CI secrets):

# macOS / Linux
export FIGMA_API_TOKEN="figd_..."

# Windows PowerShell
[System.Environment]::SetEnvironmentVariable("FIGMA_API_TOKEN", "figd_...", "User")

2. Map screens to Figma frames

Drop a .codeloop/figma.jsonin the project root. Get each frame URL from the Figma right-click menu » Copy link.

{
  "file_key": "ABC123abcXYZ",
  "frames": {
    "home": {
      "desktop": "1:24",
      "tablet":  "1:48",
      "mobile":  "1:72"
    },
    "checkout": {
      "desktop": "1:96",
      "mobile":  "1:120"
    }
  },
  "scale": 2
}

scale: 2 exports the frame at @2x so the diff matches a retina screenshot. Use 1 for non-retina runners.

3. Run the compare

The agent calls codeloop_design_compare as part of the verify loop. Or run it manually from the CLI:

# everything mapped in figma.json
npx codeloop design

# one screen
npx codeloop design --screen home

# fail (non-zero exit) if score below threshold
npx codeloop design --threshold 0.85

Local mode

For teams that don't use Figma, drop PNGs into designs/at the project root. The directory shape is enough — no config required:

designs/
  home/
    desktop.png
    tablet.png
    mobile.png
  checkout/
    desktop.png
    mobile.png

codeloop_design_compare matches each PNG to the screenshot under artifacts/runs/<run_id>/screenshots/<viewport>/<screen>.png and runs the same diff.

Match score

Each screen × viewport gets a score in [0, 1]:

1.0— pixel-perfect match.
0.9 – 0.99— minor (sub-pixel AA, font smoothing).
0.7 – 0.89— recognisable drift (spacing, colour, font weight).
< 0.7— the coded UI does not match the design.

The default gate threshold is 0.85. Tune in .codeloop/config.json:

{
  "design_compare": {
    "threshold": 0.9,
    "ignore_regions": [
      { "screen": "home", "rect": [0, 0, 1440, 64] }
    ],
    "scoring": "weighted_lab"
  }
}

scoring picks the diff metric: pixel (raw pixelmatch), weighted_lab(perceptual, weights luminance higher than chroma — recommended), or structural (SSIM, ignores small colour drift).

The gate

Design compare contributes the design_compare_evidence sub-gate inside visual_regression_threshold. By default it is warning severity. Promote it to blocker when your team is enforcing pixel-accurate delivery:

{
  "gate_check": {
    "design_severity": "blocker"
  }
}

Scoping the gate to changed screens

A focused PR that touches one page should not be blocked by a dozen unrelated baselines that were already broken before the change. Use design_compare.include / design_compare.exclude to scope the gate to the screens you actually changed:

{
  "design_compare": {
    "include": ["photometric-saved-*", "batch-export"],
    "exclude": ["led-design-bom-*", "lum-design-bom-*"]
  }
}

Patterns match the derived screen name (the filename under designs/, minus the extension and any @2x suffix) case-insensitively. Supported globs:

login — exact match
led-design-bom-* — prefix wildcard
*-summary — suffix wildcard
photometric-*-page — middle wildcard

Precedence: include (when non-empty) keeps only screens that match at least one include pattern; excludethen drops anything in that list. Reference files on disk are unchanged — a later PR with broader scope can re-enforce them by removing the filters.

If every reference is filtered out, the gate fails with a clear message instead of silently passing — that protects you from typos that would otherwise turn the gate into a no-op.

When to disable the gate entirely

If the references depict a populated app state your test database can't match (or they belong to an older design system), opt out completely with:

{
  "design_compare": {
    "enabled": false
  }
}

Dashboard view

The local dashboard renders the Figma frame next to the coded screenshot, sorted worst-to-best by score. Hover over a region to see the per-pixel diff overlay; click Open in Figma to jump to the source frame.

What changes between runs

Each run captures the design references it used into artifacts/runs/<run_id>/designs/. This makes the run reproducible — if a designer edits the Figma frame between runs, you can see exactly which version of the design any historical run was diffed against.

CI and the GitHub Action

Add FIGMA_API_TOKEN to your GitHub repo Secrets. The CodeLoop Verify Action picks it up automatically when present and surfaces the worst design regressions in the sticky PR comment.

Common gotchas

Frames not exported correctly. Make sure each Figma frame is a top-level frame (not a group) and has Export enabled in the right panel.
Token rate-limited. Figma personal tokens limit to ~6000 requests/hour. For very large frame maps, set scale: 1 and run the compare on changed screens only (--scope affected).
Aspect-ratio mismatch. The Figma frame and the captured screenshot must share an aspect ratio for a fair diff. Match your viewport widths to the frame widths in your design system.
Coloured background fills.Set the Figma frame fill to match the app's actual page background (e.g. dark mode); a mismatched fill is the most common “huge diff in an obvious place” cause.

Visual review— baseline regressions between runs (different from design drift).
Core concepts — design reference
Tool reference