AI Model Benchmark Scores

Compare 34 LLMs by MMLU, HumanEval, MATH, and Arena Elo scores. Find the best model for your use case — ranked by actual benchmark performance.

MMLU
HumanEval
MATH
Arena Elo
Composite

MMLU Scores — All Models

Premium
Mid
Budget
# Model Tier MMLU HumanEval MATH Arena Elo Composite Cost $/1M

Head-to-Head Benchmark Comparison

Understanding AI Model Benchmarks

Benchmark scores provide a standardized way to compare AI model capabilities. Here's what each benchmark measures:

Important: Benchmark scores are estimates based on published results and community data. Actual performance varies by task, prompt, and use case. Always test with your specific workload before committing to a model.