adhi.ai | humanity's last exam

Evidence Leaderboard

Public evidence board for the collapse track.

Core Question

Loading...

Running hard-gate evaluation...

Gates: --/-- Failed: -- Confidence: -- Track: cfe-collapse

Collapse Models

--

Focus Track

cfe-collapse

Top Regen

--

Avg Fallback

--

Restart Answer

--

Top 10 Trending Model Standings

How OpenRouter trending models currently perform on the CFE collapse benchmark.

Source: -- Benchmarked: --/--
OR Rank Model CFE Rank Regen Decision Pass Rate Runs Status
Loading trending standings...
0 rows
Rank Track Model Regen Collapse Objective Decision Cost USD Pass Rate Fallback Runs
Loading...