Earn 14 free days when your bug report or suggestion is accepted — how it works
Back to blog

Why Bugbot Misses Visual Regressions (and What Catches Them)

CodeLoop TeamApril 27, 20265 min read

Why Bugbot Misses Visual Regressions (and What Catches Them)

Cursor Bugbot ships with Cursor and is genuinely useful for catching static issues — null checks, unhandled promises, dead code. We use it. But there's a class of bugs Bugbot structurally cannot catch, and it's the class that hurts most when AI agents are writing your UI: visual regressions.

A visual regression is a change in the rendered output that looks wrong even though the code looks right. The LLM moved a Tailwind class. A rounded corner became sharp. A flex layout broke at 768px. A modal got stuck behind the navbar. A button changed color because a CSS variable was renamed. The diff *reads* clean. The page *looks* broken.

What it would take to catch these in code

In principle, you could try to detect "this change probably affected the rendering" purely from a diff. In practice, no static analyzer can do this reliably because:

- The relationship between code and pixels is mediated by the framework (React rendering, Tailwind compilation, Flutter widget tree).

- The same code can render differently across viewports, themes, and OS-level font rendering.

- Cascading style changes are non-local — moving one class on one component can affect siblings 6 layers away.

You have to actually render the page and look at it. That's what visual regression testing is.

What visual regression testing actually requires

Three pieces:

  • A way to render every screen consistently. Headed Playwright for web, Flutter golden tests, simctl/adb screen capture for mobile. The fixtures need to be deterministic — same viewport, same locale, same timezone, same fonts.
  • A baseline. A known-good set of PNGs (or Figma exports) to compare against.
  • A diff that's interpretable. Pixel-by-pixel diff is too noisy (anti-aliasing, sub-pixel rendering). What you want is structural diff (where did pixels change?) plus a percent-mismatch score, plus a model-readable rationale ("the submit button is now 12px taller and overlaps the email field").
  • CodeLoop ships all three:

    - codeloop_capture_screenshot standardizes the render path across macOS / Windows / Linux and across web / Flutter / native.

    - codeloop_design_compare reads from designs/ (PNGs or Figma exports via the API) and runs a structural pixel diff.

    - codeloop_visual_review returns a per-screen LLM-readable rationale that the calling agent can act on without re-reading the screenshots itself.

    The result is a gate that *blocks* "task complete" until every screen scores above your threshold (default 0.85). The agent can't declare victory while a button is overlapping an email field.

    What this means for your loop

    If you're letting Claude or GPT generate UI code in a fast loop, you need a screenshot gate, full stop. Bugbot will catch a Promise you forgot to await; it will not catch the modal you broke. Pair Bugbot with a screenshot-driven gate and you actually have something approximating a reliable AI-UI workflow.

    CodeLoop is the screenshot-driven gate.

    Try the visual review demo → · Set up design comparison →

    Frequently asked questions

    Does Cursor Bugbot do visual regression testing?

    No. Bugbot is a static analyzer — it reads the code diff and flags issues like null checks and dead code. It does not render the UI or compare screenshots, so it cannot catch visual regressions.

    What's the difference between visual regression and design comparison?

    Visual regression compares the current UI against a previous baseline (did anything change?). Design comparison compares the current UI against the original Figma design (does it match the spec?). CodeLoop does both.

    How does CodeLoop avoid noisy pixel diffs?

    It uses structural pixel-diff (pixelmatch) plus an LLM-readable rationale, then scores each screen against a configurable threshold (default 0.85). Anti-aliasing and sub-pixel jitter are filtered out.

    Can I gate my CI on visual regressions?

    Yes. CodeLoop's GitHub Action runs the same gate that runs locally; PRs fail when a screen scores below the threshold. See the GitHub Action docs.