RoundApril 16, 2026Combat Arcade Racer

April 2026 — Coding Tool Benchmark

AI coding model benchmark — Claude, Qwen, GLM, MiniMax, Kimi, and MiMo compared on a real-world Three.js game build.

8 coding models built the same Three.js game brief. Identical prompt, identical generated assets, identical agentic workflow. The only variable is the model.

Models
08
Pass rate
75%
Fastest
22m19s
Claude Opus 4
Published April 16, 2026Game Combat Arcade RacerGenre Combat Racing / Arcade Sim
Tweet
Methodology

One variable: the coding model.

Prompt, assets, workflow, and strategy advice are identical across runs. Only the model changes.

Brief

Combat Arcade Racer

A high-octane street racer that merges responsive arcade-style car mechanics with aggressive power-up systems in dense metropolitan environments. Players navigate claustrophobic city circuits, utilizing tactical abilities to outmaneuver traffic and complete high-risk challenges in a quest for urban dominance.

Combat RacingArcade SimUrban SportsCompetitive
Held constant
Concept & prompt
One game design doc, shared verbatim.
Generated assets
Same concept art, 3D models, and audio.
Workflow
build_mid_no_strategy — agentic coder loop.
Strategy advisor
Pre-baked recommendations injected identically.
Debug iterations
Bounded loop. Model decides when to stop.
Timeout
60 minutes per run. After that, it fails.
Leaderboard

Wall-clock time to a playable build.

Winner
Pass
Fail

Figure 1. Total elapsed time per model, sorted fastest-first. Failed runs pinned to the end.

Data table

Full per-model data.

01Claude Opus 4AnthropicPass22m 19s1,820110
02Claude Sonnet 4AnthropicPass36m 47s9,225144
03GLM-5.1Zhipu AIPass48m 10s2,150135
04MiniMax M2.7MiniMaxPass57m 0s1,980126
07Qwen 3.5 35BAlibabaFail60m 0s85078
08Kimi K2.6MoonshotFail60m 0s62058
05MiMo V2 ProXiaomiPass66m 0s1,740107
06Qwen 3.6 PlusAlibabaPass79m 15s2,610154

Tap any column to sort. Cost column fills in when pipeline telemetry lands.

Games

Output per model.

Per-model breakdown

Every model, per-run data.

01
No screenshot
pass
Anthropic

Claude Opus 4

Duration
22m 19s
Cost
Code lines
1,820
Files
11
Debug loops
0
Debug %
15%
Time breakdown
Initial Debug
02
No screenshot
pass
Anthropic

Claude Sonnet 4

Duration
36m 47s
Cost
Code lines
9,225
Files
14
Debug loops
4
Debug %
51%
Time breakdown
Initial Debug
03
No screenshot
pass
Zhipu AI

GLM-5.1

Duration
48m 10s
Cost
Code lines
2,150
Files
13
Debug loops
5
Debug %
43%
Time breakdown
Initial Debug
04
No screenshot
pass
MiniMax

MiniMax M2.7

Duration
57m 0s
Cost
Code lines
1,980
Files
12
Debug loops
6
Debug %
46%
Time breakdown
Initial Debug
05
No screenshot
pass
Xiaomi

MiMo V2 Pro

Duration
66m 0s
Cost
Code lines
1,740
Files
10
Debug loops
7
Debug %
47%
Time breakdown
Initial Debug
06
No screenshot
pass
Alibaba

Qwen 3.6 Plus

Duration
79m 15s
Cost
Code lines
2,610
Files
15
Debug loops
4
Debug %
11%
Time breakdown
Initial Debug
07
No screenshot
fail
Alibaba

Qwen 3.5 35B

Duration
60m 0s
Cost
Code lines
850
Files
7
Debug loops
8
Debug %
60%
Time breakdown
Initial Debug
08
No screenshot
fail
Moonshot

Kimi K2.6

Duration
60m 0s
Cost
Code lines
620
Files
5
Debug loops
8
Debug %
64%
Time breakdown
Initial Debug
Build your own

Build a playable game with Sandscape.

Sandscape takes a text prompt and returns a playable game. The same models in this benchmark run under the hood. No coding required.