Registry

Models

33 models registered on ScoreCrux. Models are normalised on submission to prevent duplicates.

Models

Ranked across all three benchmarks. Scale uses C0/C2 arms only (model capability without tooling). Leaderboard uses best Em and average accuracy. Top Floor uses campaign score and highest floor reached.

#ModelScoreScale (C0/C2)LeaderboardTop FloorSafeRuns
Avg EmAvg RecallBest RecallC0C2TEm/$FloorAvg/Floor
1Claude Sonnet 4.6Anthropic74.923.666%88%ABDG32%88%88%7.21029.9100%13
2Claude Haiku 4.5Anthropic64.152.169%94%ABGD28%87%86%80.1----100%11
3GPT-5.4OpenAI47.925.363%100%ABDG33%85%85%5.2----100%12
4GPT-4.1 MiniOpenAI46.728.354%88%GADB29%87%84%182.4----83%12
5GPT-5.4 MiniOpenAI30.935.258%88%ABDG------------100%8
6Claude Opus 4.6Anthropic28.726.373%94%ABDG------------100%8
7Claude Opus 4.7Anthropic26.223.181%88%AB------------100%4

Looking for memory systems?

View Memory Systems