Changelog

What's new in CodeLoop. Every release, documented.

v0.1.88

LatestJune 8, 2026

codeloop-mcp-server@codelooptech/shared

• Test the OTHER platform on demand. 0.1.87 made target detection mobile-first (iOS-first on a Mac), which is the right default but left no clean way to say "now test the Android app" — the verify auto-journey always re-ran iOS, and the new iOS→Android fallback only fired when iOS FAILED. You can now pin the platform: pass `target_type: "android_emulator"` (synonym "android") to codeloop_run_journey for a one-off Android run, or set `e2e.target` in .codeloop/config.json for a persistent default that the verify auto-journey honors too. The Android emulator boots on macOS just fine, so an iOS-default Mac can launch + drive the real Android app on request. Precedence: explicit run_journey target_type → e2e.target config → mobile-first auto-detection. Unrecognized e2e.target values fall through to detection instead of erroring
• The codeloop_run_journey description now instructs the agent to pass target_type for the requested platform whenever the user names one ("test the Android app", "run it on iOS"), so a plain-language request reliably switches the launch target instead of defaulting to the host-native platform

v0.1.87

June 7, 2026

codeloop-mcp-server

• The auto-journey from 0.1.86 WORKED on WedCheese log-11 — for the first time verify launched + drove the app on its own, with no skipped second tool call. But it still captured 0 screenshots for two reasons that turned out to be CodeLoop bugs, not the project's: (1) it targeted macOS DESKTOP instead of the phone, and (2) the iOS build check reported a false failure. Both are fixed here. Target detection is now MOBILE-FIRST: a Flutter app that ships ios/ or android/ is treated as a mobile app even when the default macos/windows/linux scaffolds are present. Previously any Flutter app with a macos/ folder was routed to a desktop launch on a Mac — so WedCheese ran `flutter run` for macOS, broke on `gal` requiring macOS >= 11.0, and never tried the iOS or Android apps at all. Desktop is now chosen only when NO mobile platform exists (a genuinely desktop-only Flutter app)
• The iOS build check (xcode_build) now builds a FLUTTER project through the Flutter toolchain (`flutter build ios --simulator --no-codesign`) instead of raw `xcodebuild` on the Runner project. Raw xcodebuild skips the Flutter assemble phases and CocoaPods integration, so it failed with `Module 'cloud_firestore' not found` — every pod search path missing — even though the pods WERE installed and the developer's own `flutter build ios --simulator` succeeded. That false negative was blocking verify (and therefore the gate) on a build that actually worked. Native (non-Flutter) Xcode projects still use xcodebuild as before
• run_journey now FALLS BACK to the other mobile platform when the first can't open. The host-native target is tried first (iOS simulator on macOS); if it has no device or the app won't launch, CodeLoop boots + launches the alternate platform the project ships (e.g. Android) once before giving up — so one broken platform no longer means "0 screenshots" when the other builds fine (WedCheese: gradle/Android passed while iOS pods were mid-repair). The result still reports the primary target and folds both attempts into the notes; the fallback only offers a platform the project actually has (ios/ or android/) and only offers iOS on a macOS host

v0.1.86

June 7, 2026

codeloop-mcp-server@codelooptech/shared

• verify now LAUNCHES + drives the app itself — the structural fix for "the app never opened" (WedCheese logs 7-10). Across four releases we removed every SYSTEM-generated reason the agent stopped before the visual pass (audit leak in 0.1.83, hung-test ASK in 0.1.84, stale-counter escalation in 0.1.85), yet log 10 showed the agent stopping anyway — this time on its OWN judgement ("you asked me to verify, not fix; 642 uncommitted changes; I'll ask about scope") — so codeloop_run_journey, a SEPARATE opt-in tool, was never called and the app was never launched. The root cause was architectural: visual verification depended on the agent voluntarily making a second tool call, and a cautious agent facing a failing verify always finds a reason to report-and-ask instead. So the deep-E2E journey (launch the app / boot the simulator+emulator, drive the planned journey via Maestro, screenshot every screen, record video) is now the FINAL PHASE of a full codeloop_verify on a UI project — it runs automatically, with no second tool call to skip. A `launch_failed` result is still produced and folded into the verify notes (it IS the evidence the app can't open), so a broken build can no longer silently mean "no visual evidence"
• The journey runs in every mode (driving the app is verification, not a code edit, so "don't modify my code" never waives it) and on full scope only (quick `affected` checks stay fast). Project owners can opt out with `e2e.auto_journey:false` (or the existing `evidence.capture_screenshots:false`) in .codeloop/config.json to restore the directive-only behaviour — e.g. CI where booting a device is undesirable. When the journey auto-runs, the old "MANDATORY NEXT STEP — call codeloop_run_journey" directive is suppressed (it already ran); when opted out, the directive still fires as before

v0.1.85

June 7, 2026

codeloop-mcp-server

• Stale loop counters no longer escalate a FRESH session — the real WedCheese log-9 regression. The H7 auto-fix counters (`diagnose_attempts` / `gate_attempts`) live in `~/.codeloop/loop-state.json` and only reset on a passing gate or a manual `codeloop_doctor --reset-loop-state`. A project that never reaches green (genuine pre-existing build failures) accumulated `diagnose_attempts` across days, so the FIRST codeloop_diagnose of a brand-new session immediately returned "9 attempts → escalate to the user" — and that escalation directive let the agent HALT before it ever launched/drove the app (or completed the Firebase re-auth). The counters now carry a 2-hour staleness window: when the last loop activity is older than that, the count resets to 0 on read/increment (a new session = a fresh loop), while a genuine runaway still trips the cap because 8+ diagnoses happen well within the window. Same self-clean pattern as the audit-mode TTL
• Escalation pauses the SOURCE-CODE loop ONLY — both escalation messages (diagnose cap + gate cap) now state explicitly that hitting the cap does NOT exempt the read-only evidence: if the app hasn't been driven this session the agent must STILL call codeloop_run_journey once (launch + drive + screenshots — verification, not a code edit) and complete any [CodeLoop AUTH] re-auth the run flagged (e.g. `firebase login --reauth` in an interactive terminal), THEN report. This closes the last "stop everything" off-ramp that let the agent skip launching the app across WedCheese logs 7/8/9 — audit (0.1.83) and the hung-test ASK (0.1.84) were the earlier two

v0.1.84

June 7, 2026

codeloop-mcp-server

• A hung test suite no longer STOPS the visual pass — the real WedCheese log-8 regression. When the Flutter test runner hangs mid-run, verify emits a `[CodeLoop ASK]` note asking the user whether to skip the suite or fix it; that note said only "wait for the user's choice" with no carve-out, so the agent halted at diagnose and NEVER called codeloop_run_journey (the app was never launched or driven — exactly the failure this whole roadmap targets). Both ASK notes (HUNG MID-RUN and TIMED OUT) now state explicitly that the Y/N is ONLY about the TEST SUITE and does NOT pause the rest of the run: the agent must STILL launch + drive the app (codeloop_run_journey) and gate_check this same session; only the test-suite RE-RUN waits for the user
• The MANDATORY run_journey directive now survives failures + questions — it explicitly says it runs even when this verify reported FAILURES or asked a Y/N question, so the agent doesn't stop at diagnose or wait at an ASK. A `launch_failed` result IS the evidence the user wants (it proves whether the app builds/launches at all) and is never a reason to skip the launch+drive pass
• Mobile-aware auto-capture note — a MOBILE Flutter app ships the default macos/windows/linux folders, so the desktop-app fullscreen-capture refusal fired and handed the agent a DESKTOP-flavoured note pointing at `evidence.target_app` (a desktop concept) while never mentioning the mobile launch config it actually needs. For a mobile project (Flutter with ios/ or android/, or an Android-platform project) the skip note now points at `e2e.ios_device` / `e2e.android_avd` + codeloop_run_journey and no longer recommends `evidence.target_app`. The fullscreen-capture refusal itself is unchanged — the IDE is still never valid evidence

v0.1.83

June 6, 2026

codeloop-mcp-server@codelooptech/shared

• Read-only (audit) mode can no longer LEAK across sessions — the real WedCheese regression. A leftover `.codeloop/agent_mode.json` saying "audit" (written in an earlier read-only session) was being committed/left on disk; on a fresh, NEUTRAL run ("just verify the project") the agent saw it in git status, concluded "the project is in audit mode", and re-asserted mode:"audit" on the next tool call — bypassing the 30-min TTL entirely and silently turning a fix-mode run read-only (so it never launched/drove the app). Three reinforcing fixes: (1) a STALE persisted audit is now DELETED on read, so the misleading artifact can't be seen or re-asserted; (2) `.codeloop/agent_mode.json` + `.codeloop/loop_state.json` are now gitignored by init so per-session state never travels via the repo; (3) the `mode` param description now explicitly forbids inferring audit from the file / git status — audit is decided SOLELY from the user's CURRENT message
• Audit mode still drives the app — the diagnose audit directive no longer reads as "don't launch anything". It now states unambiguously that the read-only VISUAL evidence (codeloop_run_journey: launch + drive + screenshots) must STILL be gathered once even in audit; only SOURCE-CODE edits and the fix-loop are paused. Driving the app is verification, not a modification

v0.1.82

June 6, 2026

codeloop-mcp-server

• Physical device support — codeloop_run_journey now detects connected REAL devices (`adb devices` for Android, skipping `emulator-*`; `xcrun xctrace list devices` for iOS, the Devices section only) and launches directly on them, skipping emulator/simulator boot entirely. Set `e2e.device_id` to target a specific device/emulator/simulator by id; when it names a connected device CodeLoop uses it as-is, otherwise it's treated as an AVD/simulator to boot. iOS physical devices still require a signing identity (the build passes `-allowProvisioningUpdates`)
• AI chatbox reply CAPTURE — fulfilling 'analyse the answer from the AI', the journey no longer just types + screenshots. With the opt-in Flutter driver it reads the on-screen reply text deterministically (longest non-prompt Text widget → `CODELOOP_REPLY:`) and surfaces it in the run summary + directive, telling the agent to JUDGE whether the answer is correct/substantive — not merely non-empty. The default Maestro path waits for a named reply region (`extendedWaitUntil`) and `copyTextFrom` when the plan provides a reply label, and otherwise points the agent at the captured screenshots for the answer
• Native mobile launch now PROVES liveness too — the post-launch liveness + non-blank-first-frame check (with a one-time relaunch retry) that the Flutter path already had is now also applied to NATIVE iOS-simulator (`xcodebuild` + `simctl install/launch`) and Android-emulator (`gradlew installDebug` + monkey) launches. A native app that installs but crashes on launch or renders a blank screen is no longer reported as `launched` — closing the 'app never opened' gap for native apps, not just Flutter, so CodeLoop never drives a dead/blank native screen
• Guaranteed-invocation backstop fix — the interaction_evidence gate (and the check_workflow '3b. Deep-E2E journey' step) read run_journey's evidence from `artifacts/runs/`, but a path double-nest (`join(baseDir,'runs')` on a baseDir that already ended in `/runs`) made them scan `artifacts/runs/runs` and NEVER find it — so the blocker gate could not pass even after the app was driven, and step 3b stayed PENDING forever. Both now read the correct directory, and a Pillar-C regression test pins that the gate FAILS (as a blocker) for a UI project with no interaction evidence and PASSES the moment run_journey writes interaction_evidence.json, so the 'app never driven' loop can't silently regress

v0.1.81

June 6, 2026

codeloop-mcp-server

• Higher Flutter interaction hit-rate (default Maestro path) — generated Maestro flows now use NON-FATAL matches: a navigation/button tap that can't find its label no longer aborts the whole run, so the rest of the journey and its screenshots still execute (the dominant cause of 'the interactions never happened' on label-sparse Flutter UIs). Flows also insert a settle wait (`waitForAnimationToEnd`) after navigation so the next match fires against a rendered screen, infer field labels more robustly (label → selector tail → hint, trailing `:`/`*` stripped), and — when a label genuinely can't be matched — report EXACTLY which label(s) Maestro missed (`Maestro could not match: …`) instead of a raw log tail, so the agent can add a Semantics label or enable the driver. Labels are never invented
• Opt-in high-fidelity Flutter driver — set `e2e.flutter_driver: true` and CodeLoop scaffolds a generated `integration_test` driver (widget finders: find.text / find.widgetWithText / find.byType) into artifacts/runs/<id>/ — NEVER into your test/ or lib/ — imports your app via `package:<name>/main.dart`, and runs it on the booted device with `flutter test -d <device>`. This is the deterministic answer for unlabeled canvases. It degrades to a one-line directive (and falls back to Maestro) when the project doesn't depend on integration_test — CodeLoop never edits your pubspec
• Native iOS launch hardening — `xcodebuild` for the simulator now passes `CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO CODE_SIGN_IDENTITY=` (simulator builds need no signing — this kills the most common native-iOS build failure) plus `-allowProvisioningUpdates`, on top of the existing workspace-vs-project + non-test scheme resolution. `pod install` runs in the mobile build pre-flight before the build

v0.1.80

June 6, 2026

codeloop-mcp-server@codelooptech/shared

• Mobile build PRE-FLIGHT — the dominant 'the app never opened' root cause was a skipped build setup, so codeloop_run_journey now runs a pre-flight BEFORE launching a Flutter/mobile app: it runs `flutter pub get`, runs `pod install` when an ios/Podfile exists (macOS), and checks every asset declared under `flutter: assets:` in pubspec.yaml against disk. In FIX mode it REMEDIATES automatically (runs the dep commands and creates missing declared asset DIRECTORIES — build setup, not app-source edits); in AUDIT mode it only LISTS the fixes. Build-failure signatures are mapped to specific repair steps (`Module 'cloud_firestore' not found` → `cd ios && pod install`; a missing asset → create the dir/file or remove the pubspec line; an expired `firebase login` → the [CodeLoop AUTH] re-auth push) instead of a raw error tail. The unconditional `--no-pub` was removed from `flutter run` so a skipped pre-flight still resolves deps
• Launch + interaction RELIABILITY — a 'ready' log line is no longer trusted on its own. After launch, CodeLoop now PROVES the mobile app is live (`adb shell pidof` / `dumpsys` focus on Android; `simctl spawn launchctl list` on iOS) AND rejects a BLANK/all-black first frame (decoded screenshot uniformity check), retrying the launch ONCE before returning launch_failed. Ready-detection is also robust to a MISSED log line: while waiting it periodically probes device liveness in parallel, so the app is detected as up even if `flutter run`'s ready line is never seen (no more false 'App did not finish launching' timeouts). The device window is brought frontmost before screenshotting so captures show the APP, not the IDE/desktop. In fix mode a launch_failed feeds the pre-flight repair tasks and the agent re-runs the journey automatically
• Forward-compatible e2e config — `e2e.device_id` (target a specific connected device/emulator/simulator by id) and `e2e.flutter_driver` (opt in to the high-fidelity integration_test driver) are now recognized in .codeloop/config.json

v0.1.79

June 6, 2026

codeloop-mcp-server

• Read-only (audit) mode is now OPT-IN and auto-expiring — it no longer silently sticks. Previously, once an agent ran a check in audit mode (because the user said 'don't modify my code, just list the problems'), the choice was persisted to .codeloop/agent_mode.json with NO expiry, so EVERY later run in that project stayed read-only — CodeLoop quietly stopped checking the gate, scoring confidence, and auto-fixing even when the user never asked for read-only again (the WedCheese 'nothing happens' symptom). Audit is the user's call, not CodeLoop's: persisted audit now expires after 30 minutes of inactivity and every audit tool call refreshes that window, so a genuine audit session never lapses while an abandoned one reverts to the active fix default. The instant the agent stops passing mode:'audit' (or passes mode:'fix'), CodeLoop resumes the full verify → diagnose → fix → re-verify → gate loop automatically. Persisted 'fix' never expires (it IS the default) and an explicit config.agent_mode:'audit' opt-in is still honored with no TTL
• Clearer mode guidance — the shared `mode` param description now tells the agent to LEAVE IT UNSET for normal active fixing, to set 'audit' ONLY when the user explicitly asked for a no-modification findings list (never to infer it), and to pass mode:'fix' (or omit it) to resume the auto-fix loop the moment the user wants fixes again

v0.1.78

June 6, 2026

codeloop-mcp-server

• Expired CLI login now also PUSHES re-authentication from the VISUAL path — previously the hard [CodeLoop AUTH] directive (with the exact interactive login command, e.g. `firebase login --reauth`) only fired when an auto-detected backend failed to start during codeloop_verify. Now codeloop_run_journey also scans the app-launch failure output: if a mobile/Flutter app can't open because a provider CLI session expired (Firebase, gcloud, Heroku, Vercel, Fly, AWS, Azure, GitHub, Supabase), it emits the same [CodeLoop AUTH] push with a context that explains the visual journey (screenshots + interactions) couldn't run and tells the agent to re-run codeloop_run_journey after the user signs in. Closes the gap where an expired login blocked the simulator/emulator visual verification but the agent only saw a raw error tail

v0.1.77

June 6, 2026

codeloop-mcp-server

• FULL label-based mobile interactions — codeloop_run_journey now drives a REAL interaction suite (type into named fields, tap by visible label, submit, assert) on Android emulators and iOS simulators via Maestro, instead of degrading every labelled step to a manual follow-up. The journey plan (per-entity CRUD arcs + the AI-chatbox arc) is translated into a Maestro flow that types realistic values, taps Save/Add/Delete by label, waits for AI replies (extendedWaitUntil), and screenshots each step — all while the video recording is running. The WedCheese symptom ('the simulator never activated and the interactions never happened') was three bugs stacked: run_journey wasn't invoked, the app wasn't launched, and the mobile engine was coordinate-only with no widget targeting — all three are now closed
• Native iOS & Android apps launch for real — app_launcher no longer emits a directive-only stub for non-Flutter mobile. Android: `./gradlew :app:installDebug` (falls back to installDebug) then launches via `adb shell monkey` against the resolved applicationId. iOS: resolves the scheme from `xcodebuild -list -json`, builds for the booted simulator, locates the .app in DerivedData, then `xcrun simctl install` + `simctl launch <bundleId>`. Build/launch failures surface as launch_failed with the error tail; missing tooling/scheme degrades cleanly to the exact build command
• Right app, every time — a new resolver reads the Android applicationId (build.gradle / build.gradle.kts, AndroidManifest fallback) and the iOS PRODUCT_BUNDLE_IDENTIFIER (Runner.xcodeproj / native *.xcodeproj, skipping test targets) and injects it into the generated Maestro flow header so install/launch/drive all target the same app
• Reliable invocation (two layers) — (1) codeloop_verify now emits a MANDATORY next-step in fix mode telling the agent to call codeloop_run_journey (and a clear audit-mode note that the app is not driven read-only); codeloop_check_workflow gained a '3b. Deep-E2E journey' step that stays PENDING until the app is driven. (2) A new applicable-or-n/a interaction_evidence gate in codeloop_gate_check blocks ready_for_review for UI projects until the app has actually been driven (by run_journey OR the manual interact flow), is n/a for non-UI projects and in audit mode, and points at the one-call run_journey path when no interaction happened
• Graceful degrade, never hang — when Maestro isn't installed the journey keeps the coordinate engine for what it can do and surfaces the one-line install directive (`curl -Ls https://get.maestro.mobile.dev | bash`) as a follow-up; browser/desktop targets are unchanged (Playwright selectors already give full interactions)

v0.1.76

June 6, 2026

codeloop-mcp-server

• Expired CLI login now PUSHES re-authentication instead of failing quietly — when a provider CLI session has expired (Firebase, gcloud, Heroku, Vercel, Fly, AWS, Azure, GitHub, Supabase), CodeLoop detects it from the backend/log output and emits a hard [CodeLoop AUTH] directive with the EXACT interactive login command (e.g. `firebase login --reauth`) for the agent to run so the user signs in. It explicitly forbids 'fixing' source code for an auth problem and tells the agent to re-run codeloop_verify + codeloop_run_journey afterward so the full backend + visual verification proceeds. Covers both the backend-start path (e.g. the Firebase Emulator Suite couldn't start) and the remote-log pull path
• codeloop_run_journey now actually OPENS the app — the deep-E2E executor previously booted the emulator/simulator but never launched the app on it, so there was nothing to screenshot or drive ('the app never opened'). It now performs the missing step: `flutter run -d <device>` builds, installs, and launches the app on the booted Android emulator / iOS simulator (or `-d macos|windows|linux` for Flutter desktop), waits for the first frame, drives the journey, then quits the app cleanly. A build/launch failure (e.g. an unresolved iOS module) is surfaced as launch_failed with the error tail — real signal that the app doesn't start — instead of silently driving a blank screen. Native (non-Flutter) iOS/Android degrade to a precise build+launch directive rather than pretending the app is up
• Firebase setup-failure no longer produces false criticals — when an auto-detected Firebase backend can't start because the developer's CLI session expired (`firebase emulators:start` → 'Your credentials are no longer valid. Please run firebase login --reauth'), CodeLoop now treats it as an ENVIRONMENT/setup step, not a code bug. Those CLI auth lines are excluded from the runtime_log_clean gate and are no longer misclassified by codeloop_diagnose as critical 'Build/compile errors'. A WedCheese audit on a machine with an expired firebase login had surfaced 4 false-critical 'build errors' + 2 'runtime exceptions' that were really just 'please log in'
• Auto-detected backends that fail to START are setup noise, not runtime evidence — backend.log is now folded into the runtime-log scan ONLY when the server actually became ready. A failed auto-start (unauthenticated emulator, missing CLI) degrades to a clear [CodeLoop ASK] setup hint (e.g. `firebase login --reauth` / install the Firebase CLI / set backend.enabled:false) instead of blocking the gate. Config-specified backends that fail are still real and still reported
• scanLogText now ignores CLI auth/setup lines across ALL scanned logs (backend + pulled remote/hosting logs): 'credentials are no longer valid', '… login/auth', 'not authenticated', 'command not found', 'failed to list log entries' — so an expired `gcloud`/`heroku`/`vercel`/`firebase` session can't false-trip runtime_log_clean_evidence
• Deep-E2E discoverability — codeloop_verify now points the agent at codeloop_run_journey whenever a UI-capable project (Flutter/web/iOS/Android/desktop) finishes a cycle with no screenshots captured: one call launches the app (or boots the emulator/simulator), drives the journey, screenshots every screen, and records video. It also clarifies that audit/read-only mode does NOT launch or drive the app — switch to fix mode (or capture read-only screenshots manually) for visual verification

v0.1.75

June 6, 2026

codeloop-mcp-server@codelooptech/shared

• Deep-E2E executor — new `codeloop_run_journey` tool launches the app, drives a real user journey, and captures screenshots + video in ONE hands-free call. It detects the target, READY/LAUNCHes per platform (web → headed Playwright at e2e.web_url; desktop → launches evidence.target_app; Android/iOS → BOOTS the emulator/simulator), plans the journey, records, drives every deterministic step, screenshots each, visits EVERY discovered screen, and hands back a replay + gate directive. This is the autonomous counterpart to the manual plan→start_recording→interact→stop→replay sequence the WedCheese verification was missing
• Real device boot — `bootDevice()` opens an Android AVD (`flutter emulators --launch` / `emulator -avd`, then waits for `sys.boot_completed`) or an iOS simulator (`xcrun simctl boot` + `open -a Simulator` + `bootstatus`). It reuses an already-booted device, honors `e2e.android_avd` / `e2e.ios_device`, auto-picks the first available otherwise, and degrades with copy-paste instructions when the tooling is absent. Booting only ever happens inside `codeloop_run_journey` — never in `codeloop_verify` — and the tool REFUSES in audit/read-only mode (driving the app modifies its state)
• AI-chatbox arc — the journey planner now emits an executable 'type prompt → wait → assert a non-empty AI reply' arc, so the chatbox/AI-answer flow (type a question, read back the model's answer) is driven and verified automatically, not just suggested
• Backend's OWN tests, in detail, for ANY backend — `codeloop_verify` now locates and runs the backend's dedicated unit/integration suite when it lives in a subdir (functions/, server/, api/, backend/, a .csproj) via `npm test` / `pytest` / `go test` / `dotnet test` / `mvn test` / `gradlew test`, surfaced as a new `backend_tests_evidence` gate feeding required_tests_pass. Skips cleanly (n/a) when no backend test target exists
• Firebase / Firestore support (WedCheese's stack) — auto-detects `firebase.json` / `.firebaserc` / `functions/` and starts the Emulator Suite (`firebase emulators:start --only firestore,functions,auth`), runs the Cloud Functions suite + Firestore security-rules tests against the running emulator with the right `FIRESTORE_EMULATOR_HOST` / `FUNCTIONS_EMULATOR` env, smoke-probes without false-failing on project-specific endpoints, and gates the Firestore schema on `firestore.rules` / `firestore.indexes.json` presence instead of ORM migration drift. Degrades to n/a when the Firebase CLI is absent
• Cross-platform: the executor and backend verification work on Windows/macOS/Linux with Cursor or Claude Code, for web / desktop / Flutter / native iOS / native Android

v0.1.74

June 5, 2026

codeloop-mcp-server

• Accurate Flutter analyze counts — `flutter analyze` lists info, warning AND error severities plus a trailing 'N issues found', and the primary analyze runner used to report that ENTIRE total as failures. A WedCheese audit with 2,517 findings (2,515 info lints like avoid_print in tool/ scripts, 1 warning, ONE real error) showed up as '2524 failed', slammed confidence to 0, and failed the build gate on lint noise. The runner now classifies by severity and fails ONLY on errors (matching the deep-internal static-analysis gate), so the same run reports the handful of REAL failures instead of thousands
• The warning/info breakdown is still surfaced ('N error(s) [BLOCKING], W warning(s), I info(s) [NON-BLOCKING]') so lint debt stays visible without inflating the failure count, and build_status no longer marks a Flutter build 'failed' on a non-zero analyze exit that was caused purely by warnings (only infos are suppressed by --no-fatal-infos)

v0.1.73

June 1, 2026

codeloop-mcp-server

• Honest key diagnostics — an EXPIRED key (trial or paid period ended) is no longer mislabeled as 'revoked'. The MCP error now says the key EXPIRED, that nobody revoked it, and points at billing / contributions instead of telling you to swap a key in mcp.json (which made a WedCheese trial-expiry look like a key someone had revoked)
• Paired with a backend fix (server-side): the free-trial bonus you earn from approved contributions now keeps your trial key alive even when the bonus was granted onto a different/rotated key — validateKey re-derives the true end from created_at + 14 days + lifetime contribution bonus and self-heals a key that was prematurely expired, and the dashboard 'Free Use Bonus' no longer shows '-- Remaining' while advertising '+N bonus days'
• Free-use countdown is now computed at the USER level (/v1/keys returns free_trial_end + free_days_remaining): it spans every trial key regardless of status — a bonus-extended expiry sitting on a rotated/revoked key is still surfaced — and falls back to account-creation when you hold zero keys, so 'Free Use Bonus' shows real days remaining instead of '--' for accounts with no active key
• Cancelled-subscription recovery: a paid key that Stripe flipped to 'expired' when the subscription ended now REACTIVATES as a free trial key (downgraded to free-tier limits) whenever your earned contribution days still cover the present — both on key validation (validateKey) and on dashboard load (reconcile). A former subscriber with banked free days is no longer locked out behind a dead 'expired' key
• Website auth hardening — intermittent Google/OAuth ?error=Configuration now routes to a retryable /login page with a friendly 'temporary hiccup — try again' notice instead of dead-ending on Auth.js's default error screen; AUTH_SECRET + AUTH_TRUST_HOST set on Vercel make cold-start config resolution deterministic

v0.1.72

June 1, 2026

codeloop-mcp-server

• Live verify progress — codeloop_verify now streams MCP notifications/progress so a long run never LOOKS frozen: Cursor's tool-call card shows the current phase, elapsed time, and a rough ETA, ticking on a ≤10s heartbeat even past the 3-minute mark
• Closes the last WedCheese symptom: after the 0.1.71 test-hang fix, the real long pole was a legitimately slow native Android AOT compile (gradle assembleDebug → gen_snapshot, ~7 min) running silent — so the developer interrupted verify and orphaned the build processes
• Per-phase labels make the wait legible — 'Analyzing Dart code', 'Running widget/unit tests', 'Building Android app (gradle + AOT/gen_snapshot)', 'Verifying backend', 'Finalizing report' — and the message flips to an honest 'still running (longer than expected)' once past the estimate
• Fully best-effort + opt-in: progress only emits when the client requests it (passes a progressToken); the progress value is strictly monotonic and the bar's total is clamped above elapsed so it never sticks at 100%; the heartbeat is unref()'d + cleared on return; sendNotification is fire-and-forget — a transport hiccup can never fail or slow verify, which is byte-for-byte unchanged for clients that don't render progress
• Cross-client safe: progress is gated per client — Cursor, VS Code, Windsurf, Cline and generic SDK clients get it, but Claude Code is deliberately excluded because (as of 2026) it neither renders progress nor tolerates it — emitting tears down its stdio transport and respawns the server (anthropics/claude-code #47765 / #53617). Override with CODELOOP_PROGRESS=on|off
• Works on Windows/macOS/Linux for every app type — Windows/macOS .NET desktop, Flutter mobile, native iOS/Android, and web — with per-phase labels that mirror each platform's runner path

v0.1.71

May 31, 2026

codeloop-mcp-server

• Progress-stall watchdog — catches the test-suite hang the completion-settle watchdog cannot: a suite that freezes MID-RUN (one isolate leaks a Timer/StreamController/Firebase listener after its test passes, so it never exits) before ever printing an end-of-run marker
• Real fix for the Flutter case where `flutter test --coverage` froze at `+24 -5: …` while the compact reporter's clock ticked 00:16→05:54 — no end-of-run marker (settle never armed) and continuous clock output (silence watchdog never tripped), so only the 6-min hard cap freed it
• CodeLoop now tracks the reporter's progress signature (counts, clock stripped); while it stays frozen the run is force-closed in ~90s — but any real progress resets the grace, and once the end-of-run marker prints the stall watchdog hands off to settle so a `--coverage` lcov dump is never killed
• A mid-run stall is marked FAILED with partial results (never a false pass) and verify asks the user (Y/N) to dispose the leaked resource in tearDown() or re-run with skip_tests:true

v0.1.70

May 31, 2026

codeloop-mcp-server

• Test-suite hang resilience for EVERY stack — a hanging or leaky test suite can no longer freeze codeloop_verify on any OS (Windows/macOS/Linux), any agent (Cursor/Claude Code), or any app type (website, Node, Python, Rust, native iOS/Android, Windows/macOS .NET, Flutter)
• Hard wall-clock caps added to every previously-untimed runner: Node (npm/jest/vitest/mocha), Python (pytest/unittest), Rust (cargo), Playwright, Maestro, and the native xcodebuild/gradle/dotnet build+test commands
• Silence-based completion-settle watchdog: arms on each runner's end-of-run marker (jest 'Ran all test suites', vitest 'Test Files', mocha 'N passing', pytest '=== N passed in ===', cargo 'test result:', dotnet 'Test Run Successful./Failed.') and force-closes a leaked process ~30s after output goes quiet — recovering the real pass/fail instead of burning the full timeout
• Monorepo-safe: the settle grace resets on every output chunk, so turbo/nx/lerna runs that print one summary per package are never killed mid-stream
• New skip_tests control — per-call `skip_tests:true` on codeloop_verify or persistent config.tests.run=false runs every check EXCEPT the project's own test suite + coverage (incl. native gradle test / dotnet test), while the build still runs
• Honest skipping: the required_tests_pass gate BLOCKS ready_for_review when tests are skipped unless explicitly waived via config.tests.waive_gate — a skipped suite can never silently produce a 'Verified by CodeLoop' result
• verify now distinguishes a true timeout (asks the user Y/N to skip or fix) from an auto-recovered leak (informational: dispose the resource in tearDown())

v0.1.20

April 30, 2026

codeloop

• Added /v1/billing/health admin endpoint surfacing Stripe mode + price configuration
• Webhook coverage extended: charge.refunded, four dispute events, invoice.paid, customer.created, payment_method.attached
• Idempotency cache: 24h TTL alongside the 5k-entry LRU bound
• scripts/stripe-live-bootstrap.sh + scripts/stripe-listen-and-trigger.sh — idempotent live-mode provisioning + replay
• Dashboard shows a Stripe environment badge (admin-only)
• codeloop doctor: optional Stripe subsection (CODELOOP_ADMIN_TOKEN) + GEO subsection
• GEO push: 29 per-tool deep pages at /tools/<name>, /docs/llm-search GEO landing page, FAQ + HowTo schemas across docs
• /changelog.json (JSON-Feed) + /changelog.atom
• smithery.yaml + AI registry submission tracker + scripts/geo/ping-on-deploy.mjs

v0.1.19

April 30, 2026

codeloop

• Launch operations runbook: /docs/launch + Cursor Marketplace prep + Stripe LIVE preflight
• Benchmark harness: /benchmarks/buggy-commits-50 with weekly cron + network-purity test
• Tier-B Windows runbook: /docs/tier-b + scripts/windows/preflight.ps1 + capture-evidence.ps1
• Claude Code multi-app pack: /docs/claude-code-apps + 5-app fixture clone script
• codeloop doctor: Tier-B + Claude Code evidence subsection
• 5 new docs pages + sitemap + llms-full + DEVELOPMENT_LOG.md

v0.1.15

April 20, 2026

codeloop

• Added `codeloop cursor-rule` CLI command to print/refresh the global Cursor User Rule snippet
• Created ~/.cursor/codeloop-user-rule.md as a stable copy-paste source for Cursor Settings
• Interactive banner during `codeloop init --global` guides Cursor users through one-time rule setup
• Defensive write of ~/.cursor/rules/codeloop.mdc for forward compatibility
• Fixed ESM-compatible test mocking for os.homedir() using environment variable approach

v0.1.14

April 18, 2026

codeloop

• Global rules writer: writes ~/.cursor/mcp.json, ~/.claude.json, and ~/.claude/CLAUDE.md
• New GLOBAL_CURSOR_USER_RULE_SNIPPET template for cross-workspace activation
• MCP tool descriptors updated with FIRST-USE BOOTSTRAP paragraphs for self-bootstrapping
• Hardened INIT_HINT with explicit `codeloop_init_project` call instructions
• 14 new tests locking global rule content and merge behavior

v0.1.13

April 15, 2026

codeloopcodeloop-mcp-server

• Comprehensive design comparison: `codeloop_design_compare` with pixelmatch-based pixel diffing
• Multi-viewport fan-out: compare against Figma frames across mobile, tablet, and desktop viewports
• New `design_compare_evidence` blocker gate — agent cannot ship until pixels match design spec
• Figma REST API integration: configure FIGMA_API_TOKEN and .codeloop/figma.json to auto-fetch frames
• New CLI commands: `codeloop design fetch` and `codeloop design compare` for local smoke testing
• Agent rule templates updated to mandate design comparison when design references are present

v0.1.12

April 10, 2026

codeloop-mcp-server

• Interaction testing: `codeloop_interact` with 40+ actions across macOS, Windows, Linux, Android, iOS
• Windows UI Automation support via `win_accessibility.ts`
• Motion-validated video recording with multi-monitor support
• ffmpeg auto-install with Homebrew bootstrapping on macOS
• App log capture during recording sessions
• Window-scoped screenshot capture with IDE focus restoration

v0.1.11

April 5, 2026

codeloop-mcp-server

• Multi-section orchestration: dependency graph manager and rolling integration checkpoints
• `codeloop_check_workflow` enforcement tool for pre-completion verification
• `codeloop_discover_screens` static scanner for routes across Flutter, web, mobile, Xcode, Android, .NET
• Workflow enforcement: verify run, screenshots, video, gate check, and dev log are all required

v0.1.10

March 28, 2026

codeloopcodeloop-mcp-server

• Contributor Rewards Program: +14 days for every accepted bug report, feature request, or comment
• Anti-abuse layer: email canonicalization, device fingerprint, IP burst guard, key-sharing detection
• Dashboard contribution form with attachment uploader (Vercel Blob, 200MB limit)
• Admin review interface at /admin/contributions with approve/decline workflow

v0.1.5

March 28, 2026

@codelooptech/shared

• Updated gate registry with `design_compare_evidence` and `video_evidence` gates
• Failure taxonomy: deterministic_bug, flaky_test, environment_failure categories
• Stop policies: max_repair_attempts and confidence_stall_limit
• Run lineage tracking: run_id, parent_run_id, prompt_template_version

v0.1.0

March 22, 2026

codeloopcodeloop-mcp-server@codelooptech/shared

• Initial public beta release
• Core verify-diagnose-fix loop with 7 MCP tools
• Cursor and Claude Code support via MCP
• 14-day free trial with Team-tier allowance
• Dashboard with usage tracking, API key management, and billing
• Cross-platform support: macOS, Windows, Linux