Results
Opus 4 finished in 22 min 19 s. 1,820 lines across 11 files. Zero debug iterations.
Sonnet 4 finished in 36 min 47 s. 9,225 lines across 14 files. Four debug iterations.
Code volume ratio: 5:1 in Sonnet's direction. Both builds passed the exit criteria.
Per-model breakdown
Sonnet 4 scaffolded broadly on the first pass, then relied on the debug loop to correct compile and runtime failures over four iterations. Opus 4 produced a smaller codebase that compiled and ran on the first attempt.
Opus 4 was the only passing model in the April 2026 round to reach zero debug iterations on the Three.js brief, which required physics, the render loop, and input handling to initialize together on first boot.
Four debug iterations for Sonnet 4 is within the pass band. The model detected failures, patched them, and continued until the build ran.
Selection guidance
Opus 4 fits workloads where wall-clock time or first-pass correctness is the binding constraint, including live or agentic demos where a failed first compile is visible to the user.
Sonnet 4 fits workloads where token cost is the binding constraint, where runs are batched, or where a debug-loop stage is already part of the pipeline.
Methodology
We gave both models the same Three.js combat racer brief and the same exit criteria. We measured wall-clock time, total lines of code written, number of files produced, and number of debug-loop iterations required to reach a passing build. No prompt changes, no manual edits, and no tool-configuration differences between runs.