Can GLM-5.1 build a playable Three.js game?

In the April 2026 benchmark, GLM-5.1 produced a playable Three.js arcade racer from the same brief as Claude Opus 4. Total runtime was 48 min 10 s across 2,150 lines and five debug iterations.

GLM-5.1 vs Claude Opus 4 for Three.js

Opus 4 finished in 22 min 19 s with 1,820 lines and zero debug iterations; GLM-5.1 finished in 48 min 10 s with 2,150 lines and five debug iterations. Both builds met the same exit criteria.

When to pick GLM-5.1 over Claude Opus 4 for game dev

GLM-5.1 is a fit when regional availability, data residency, or pricing on Zhipu's platform matters. On raw throughput for this Three.js brief, Opus 4 was faster.

Head-to-head · April 2026 — Coding Tool Benchmark

Claude Opus 4 vs GLM-5.1for games

Claude Opus 4 finished the Three.js brief in 22 min 19 s; GLM-5.1 finished in 48 min 10 s.

pass·pass·Winner: Claude Opus 4

Results

Per-model results.

Anthropic

Claude Opus 4

Winner

Status

PASS

Duration

22m 19s

Code Lines

1,820

Files

Debug Loops

Cost

—

Zhipu AI

GLM-5.1

Status

PASS

Duration

48m 10s

Code Lines

2,150

Files

Debug Loops

Cost

—

Playable builds

Per-model playable builds.

Raw output from each model. Same brief, same assets, same 60-minute ceiling. No manual edits applied.

Playable build · publishing soon

Claude Opus 4 build is being packaged for web.

Full build output is captured for every run. Hosted versions go live shortly after each round. See the blog for round write-ups.

Now showing: Claude Opus 4

Leaderboard

Wall-clock time to a playable build.

Winner

Pass

Fail

Figure 1. Total elapsed time per model, sorted fastest-first. Failed runs pinned to the end.

Analysis

Results

Claude Opus 4: 22 min 19 s, 1,820 lines, 0 debug iterations.
GLM-5.1: 48 min 10 s, 2,150 lines across 13 files, 5 debug iterations.

Opus 4 was 2.16x faster in wall-clock time at comparable code volume. Both builds passed the same exit criteria.

Time breakdown

Opus 4: 19 min initial development, 3 min 19 s debug cleanup.
GLM-5.1: 27 min 36 s initial development, 20 min 34 s debug work.

Debug time accounted for 15% of Opus 4's run and 43% of GLM-5.1's run.

Output characteristics

Both models built the same Three.js arcade racer from the same prompt, the same generated assets, and the same agentic workflow. GLM-5.1's higher debug-iteration count reflects a fix-in-loop strategy rather than a failure mode: the final output compiled and ran.

Selection criteria

Latency-sensitive work: Opus 4.
Regional availability, data residency, or Zhipu platform integration: GLM-5.1.

Neither model required a retry, human intervention, or fallback to a different agent.

Verdict

Opus 4 produced a playable build in half the wall-clock time with zero debug iterations, while GLM-5.1 required five.

FAQ

In the April 2026 benchmark, GLM-5.1 produced a playable Three.js arcade racer from the same brief as Claude Opus 4. Total runtime was 48 min 10 s across 2,150 lines and five debug iterations.

Other comparisons