Registry
Models
33 models registered on ScoreCrux. Models are normalised on submission to prevent duplicates.

Ranked across all three benchmarks. Scale uses C0/C2 arms only (model capability without tooling). Leaderboard uses best Em and average accuracy. Top Floor uses campaign score and highest floor reached.
| # | Model | Score | Scale (C0/C2) | Leaderboard | Top Floor | Safe | Runs | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Avg Em | Avg Recall | Best Recall | C0 | C2 | T | Em/$ | Floor | Avg/Floor | ||||||
| 1 | Claude Sonnet 4.6Anthropic | 74.9 | 23.6 | 66% | 88% | ABDG | 32% | 88% | 88% | 7.2 | 10 | 29.9 | 100% | 13 |
| 2 | Claude Haiku 4.5Anthropic | 64.1 | 52.1 | 69% | 94% | ABGD | 28% | 87% | 86% | 80.1 | -- | -- | 100% | 11 |
| 3 | GPT-5.4OpenAI | 47.9 | 25.3 | 63% | 100% | ABDG | 33% | 85% | 85% | 5.2 | -- | -- | 100% | 12 |
| 4 | GPT-4.1 MiniOpenAI | 46.7 | 28.3 | 54% | 88% | GADB | 29% | 87% | 84% | 182.4 | -- | -- | 83% | 12 |
| 5 | GPT-5.4 MiniOpenAI | 30.9 | 35.2 | 58% | 88% | ABDG | -- | -- | -- | -- | -- | -- | 100% | 8 |
| 6 | Claude Opus 4.6Anthropic | 28.7 | 26.3 | 73% | 94% | ABDG | -- | -- | -- | -- | -- | -- | 100% | 8 |
| 7 | Claude Opus 4.7Anthropic | 26.2 | 23.1 | 81% | 88% | AB | -- | -- | -- | -- | -- | -- | 100% | 4 |
Looking for memory systems?
View Memory Systems