Why Bugbot Misses Visual Regressions (and What Catches Them)
Why Bugbot Misses Visual Regressions (and What Catches Them)
Cursor Bugbot ships with Cursor and is genuinely useful for catching static issues — null checks, unhandled promises, dead code. We use it. But there's a class of bugs Bugbot structurally cannot catch, and it's the class that hurts most when AI agents are writing your UI: visual regressions.
A visual regression is a change in the rendered output that looks wrong even though the code looks right. The LLM moved a Tailwind class. A rounded corner became sharp. A flex layout broke at 768px. A modal got stuck behind the navbar. A button changed color because a CSS variable was renamed. The diff *reads* clean. The page *looks* broken.
What it would take to catch these in code
In principle, you could try to detect "this change probably affected the rendering" purely from a diff. In practice, no static analyzer can do this reliably because:
- The relationship between code and pixels is mediated by the framework (React rendering, Tailwind compilation, Flutter widget tree).
- The same code can render differently across viewports, themes, and OS-level font rendering.
- Cascading style changes are non-local — moving one class on one component can affect siblings 6 layers away.
You have to actually render the page and look at it. That's what visual regression testing is.
What visual regression testing actually requires
Three pieces:
diff is too noisy (anti-aliasing, sub-pixel rendering). What you want is structural diff (where did pixels change?) plus a percent-mismatch score, plus a model-readable rationale ("the submit button is now 12px taller and overlaps the email field").CodeLoop ships all three:
- codeloop_capture_screenshot standardizes the render path across macOS / Windows / Linux and across web / Flutter / native.
- codeloop_design_compare reads from designs/ (PNGs or Figma exports via the API) and runs a structural pixel diff.
- codeloop_visual_review returns a per-screen LLM-readable rationale that the calling agent can act on without re-reading the screenshots itself.
The result is a gate that *blocks* "task complete" until every screen scores above your threshold (default 0.85). The agent can't declare victory while a button is overlapping an email field.
What this means for your loop
If you're letting Claude or GPT generate UI code in a fast loop, you need a screenshot gate, full stop. Bugbot will catch a Promise you forgot to await; it will not catch the modal you broke. Pair Bugbot with a screenshot-driven gate and you actually have something approximating a reliable AI-UI workflow.
CodeLoop is the screenshot-driven gate.
Frequently asked questions
Does Cursor Bugbot do visual regression testing?
No. Bugbot is a static analyzer — it reads the code diff and flags issues like null checks and dead code. It does not render the UI or compare screenshots, so it cannot catch visual regressions.
What's the difference between visual regression and design comparison?
Visual regression compares the current UI against a previous baseline (did anything change?). Design comparison compares the current UI against the original Figma design (does it match the spec?). CodeLoop does both.
How does CodeLoop avoid noisy pixel diffs?
It uses structural pixel-diff (pixelmatch) plus an LLM-readable rationale, then scores each screen against a configurable threshold (default 0.85). Anti-aliasing and sub-pixel jitter are filtered out.
Can I gate my CI on visual regressions?
Yes. CodeLoop's GitHub Action runs the same gate that runs locally; PRs fail when a screen scores below the threshold. See the GitHub Action docs.