AI Coding Benchmark1 round

AI coding models, same game brief, same conditions.

AI coding and LLM benchmark for games — Claude, Qwen, GLM, MiniMax, Kimi, and more, tested on identical Three.js game briefs.

Every round puts each major AI coding model through the same Three.js game brief, with identical generated assets and the same agentic workflow. One variable: the coding model. Results are wall-clock time, debug iterations, code volume, and whether the build runs.

Archive

All rounds.

First round published. New rounds ship when a major model releases.

About the benchmark

FAQ

  • Every major AI coding model gets the same Three.js game brief, same assets, same agentic workflow. Only the model changes. We measure wall-clock, debug iterations, code volume, and whether the build runs.
Build your own

Build a playable game with Sandscape.

Sandscape takes a text prompt and returns a playable game. The same models in this benchmark run under the hood. No coding required.