Cost is the estimated USD API price for one full ATM-Bench-Hard run (31 questions), computed from per-call token usage (uncached input, cache write, cache read, output) at each provider's public list ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results