Results
- Claude Opus 4: 22 min 19 s, 1,820 lines, 0 debug iterations.
- GLM-5.1: 48 min 10 s, 2,150 lines across 13 files, 5 debug iterations.
Opus 4 was 2.16x faster in wall-clock time at comparable code volume. Both builds passed the same exit criteria.
Time breakdown
- Opus 4: 19 min initial development, 3 min 19 s debug cleanup.
- GLM-5.1: 27 min 36 s initial development, 20 min 34 s debug work.
Debug time accounted for 15% of Opus 4's run and 43% of GLM-5.1's run.
Output characteristics
Both models built the same Three.js arcade racer from the same prompt, the same generated assets, and the same agentic workflow. GLM-5.1's higher debug-iteration count reflects a fix-in-loop strategy rather than a failure mode: the final output compiled and ran.
Selection criteria
- Latency-sensitive work: Opus 4.
- Regional availability, data residency, or Zhipu platform integration: GLM-5.1.
Neither model required a retry, human intervention, or fallback to a different agent.