Head-to-head · April 2026 — Coding Tool Benchmark

Claude Opus 4 vs Claude Sonnet 4for games

Claude Opus 4 vs Claude Sonnet 4 for games — AI coding benchmark on a Three.js game build with timing, debug iterations, and code metrics.

Claude Opus 4 and Claude Sonnet 4 produced playable builds from the same Three.js brief with a 5:1 gap in code volume.

pass·pass·Winner: Claude Opus 4
Results

Per-model results.

Anthropic

Claude Opus 4

Winner
Status
PASS
Duration
22m 19s
Code Lines
1,820
Files
11
Debug Loops
0
Cost
Anthropic

Claude Sonnet 4

Status
PASS
Duration
36m 47s
Code Lines
9,225
Files
14
Debug Loops
4
Cost
Playable builds

Per-model playable builds.

Raw output from each model. Same brief, same assets, same 60-minute ceiling. No manual edits applied.

Playable build · publishing soon
Claude Opus 4 build is being packaged for web.

Full build output is captured for every run. Hosted versions go live shortly after each round. See the blog for round write-ups.

Now showing: Claude Opus 4
Leaderboard

Wall-clock time to a playable build.

Winner
Pass
Fail

Figure 1. Total elapsed time per model, sorted fastest-first. Failed runs pinned to the end.

Analysis

Results

Opus 4 finished in 22 min 19 s. 1,820 lines across 11 files. Zero debug iterations.

Sonnet 4 finished in 36 min 47 s. 9,225 lines across 14 files. Four debug iterations.

Code volume ratio: 5:1 in Sonnet's direction. Both builds passed the exit criteria.

Per-model breakdown

Sonnet 4 scaffolded broadly on the first pass, then relied on the debug loop to correct compile and runtime failures over four iterations. Opus 4 produced a smaller codebase that compiled and ran on the first attempt.

Opus 4 was the only passing model in the April 2026 round to reach zero debug iterations on the Three.js brief, which required physics, the render loop, and input handling to initialize together on first boot.

Four debug iterations for Sonnet 4 is within the pass band. The model detected failures, patched them, and continued until the build ran.

Selection guidance

Opus 4 fits workloads where wall-clock time or first-pass correctness is the binding constraint, including live or agentic demos where a failed first compile is visible to the user.

Sonnet 4 fits workloads where token cost is the binding constraint, where runs are batched, or where a debug-loop stage is already part of the pipeline.

Methodology

We gave both models the same Three.js combat racer brief and the same exit criteria. We measured wall-clock time, total lines of code written, number of files produced, and number of debug-loop iterations required to reach a passing build. No prompt changes, no manual edits, and no tool-configuration differences between runs.

Verdict

Opus 4 shipped in 22 min 19 s with 1,820 lines and zero debug iterations; Sonnet 4 needed 36 min 47 s, 9,225 lines, and four debug passes.

FAQ

FAQ

  • In the April 2026 Three.js build benchmark, Opus 4 finished in 22 min 19 s with 1,820 lines and zero debug iterations. Sonnet 4 finished in 36 min 47 s with 9,225 lines and four debug iterations.
Full round

2 of 8 models shown. See the full round.

All models in the round ran the same brief, same assets, same agentic workflow. The archive has full per-model metrics.