Is Claude Opus 4 worth the cost over Sonnet 4 for game development?

Opus 4 used about half the wall-clock time and about one-fifth the code volume in this benchmark. Sonnet 4 still produced a playable build, so the answer depends on whether wall-clock time or token cost is the binding constraint.

Can Claude Sonnet 4 build a Three.js game without human help?

Yes. In this benchmark Sonnet 4 took 36 min 47 s and four debug iterations, and the final build passed the exit criteria.

Head-to-head · April 2026 — Coding Tool Benchmark

Claude Opus 4 vs Claude Sonnet 4for games

Claude Opus 4 and Claude Sonnet 4 produced playable builds from the same Three.js brief with a 5:1 gap in code volume.

pass·pass·Winner: Claude Opus 4

Results

Per-model results.

Anthropic

Claude Opus 4

Winner

Status

PASS

Duration

22m 19s

Code Lines

1,820

Files

11

Debug Loops

0

Cost

—

Anthropic

Claude Sonnet 4

Status

PASS

Duration

36m 47s

Code Lines

9,225

Files

14

Debug Loops

4

Cost

—

Playable builds

Per-model playable builds.

Raw output from each model. Same brief, same assets, same 60-minute ceiling. No manual edits applied.

Playable build · publishing soon

Claude Opus 4 build is being packaged for web.

Full build output is captured for every run. Hosted versions go live shortly after each round. See the blog for round write-ups.

Now showing: Claude Opus 4

Leaderboard

Wall-clock time to a playable build.

Winner

Pass

Fail

Figure 1. Total elapsed time per model, sorted fastest-first. Failed runs pinned to the end.

Analysis

Results

Opus 4 finished in 22 min 19 s. 1,820 lines across 11 files. Zero debug iterations.

Sonnet 4 finished in 36 min 47 s. 9,225 lines across 14 files. Four debug iterations.

Code volume ratio: 5:1 in Sonnet's direction. Both builds passed the exit criteria.

Per-model breakdown

Sonnet 4 scaffolded broadly on the first pass, then relied on the debug loop to correct compile and runtime failures over four iterations. Opus 4 produced a smaller codebase that compiled and ran on the first attempt.

Opus 4 was the only passing model in the April 2026 round to reach zero debug iterations on the Three.js brief, which required physics, the render loop, and input handling to initialize together on first boot.

Four debug iterations for Sonnet 4 is within the pass band. The model detected failures, patched them, and continued until the build ran.

Selection guidance

Opus 4 fits workloads where wall-clock time or first-pass correctness is the binding constraint, including live or agentic demos where a failed first compile is visible to the user.

Sonnet 4 fits workloads where token cost is the binding constraint, where runs are batched, or where a debug-loop stage is already part of the pipeline.

Methodology

We gave both models the same Three.js combat racer brief and the same exit criteria. We measured wall-clock time, total lines of code written, number of files produced, and number of debug-loop iterations required to reach a passing build. No prompt changes, no manual edits, and no tool-configuration differences between runs.

Verdict

Opus 4 shipped in 22 min 19 s with 1,820 lines and zero debug iterations; Sonnet 4 needed 36 min 47 s, 9,225 lines, and four debug passes.

FAQ

In the April 2026 Three.js build benchmark, Opus 4 finished in 22 min 19 s with 1,820 lines and zero debug iterations. Sonnet 4 finished in 36 min 47 s with 9,225 lines and four debug iterations.

Other comparisons