Leaderboard

Explore how AI models perform across our five core evaluation categories. Rankings are based on real-world conversations and human evaluations, measuring what truly matters in an AI assistant.

Model
#1
Gemini 3 ProNew
gemini-3-pro-preview
93.18
96.0
93.0
94.0
88.7
94.3
80.0
$2.00 / $12.00
2M
65.5K
#2
Claude Opus 4.5New
claude-opus-4-5-202511...
91.38
90.5
87.7
95.3
89.6
93.8
60.0
$5.00 / $25.00
200K
64K
#3
Claude Sonnet 4.5
claude-sonnet-4-5
91.22
91.5
88.8
94.0
89.5
92.3
68.5
$3.00 / $15.00
200K
64K
#4
GPT-5.1
gpt-5.1
90.92
92.5
90.5
95.0
82.1
94.5
80.0
$2.50 / $10.00
272K
8.2K
#5
Grok 4.1 ThinkingNew
grok-4.1-thinking
89.96
91.5
91.5
94.0
79.8
93.0
62.0
$3.00 / $15.00
256K
8.2K
#6
GPT-5
gpt-5
89.29
91.0
87.7
94.0
80.3
93.5
72.0
$1.25 / $10.00
400K
128K
#7
Grok 4.1New
grok-4.1
88.85
90.5
91.0
92.5
78.3
92.0
78.0
$3.00 / $15.00
256K
8.2K
#8
o3
o3-2025-04-16
88.84
89.5
90.0
93.2
77.5
94.0
58.0
$2.00 / $8.00
200K
100K
#9
Claude Opus 4.1
claude-opus-4-1
88.30
90.0
83.5
92.0
85.5
90.5
54.0
$15.00 / $75.00
200K
8.2K
#10
ChatGPT-4o
chatgpt-4o
87.30
88.5
90.5
89.0
81.5
87.0
82.2
$5.00 / $15.00
128K
16.4K
#11
Claude Haiku 4.5
claude-haiku-4-5
86.80
87.5
82.0
90.0
86.5
88.0
93.0
$1.00 / $5.00
200K
64K
#12
Grok 4 Fast Reasoning
grok-4-fast-reasoning
86.70
89.0
86.5
91.5
76.0
90.5
88.0
$0.20 / $0.50
2M
8.2K
#13
Gemini 2.5 Pro
gemini-2.5-pro
86.47
88.3
85.3
91.3
78.3
89.2
60.4
$2.00 / $12.00
1M
65.5K
#14
o4-mini
o4-mini
86.30
89.5
84.5
94.5
72.0
91.0
95.0
$1.10 / $4.40
200K
100K
#15
DeepSeek V3.2 Thinking
deepseek-reasoner-v3.2
86.03
89.0
74.0
93.5
79.7
94.0
45.0
$0.14 / $0.28
160K
32.8K
#16
Grok 4 Fast
grok-4-fast-non-reason...
85.20
87.0
86.0
89.0
75.5
88.5
93.0
$0.20 / $0.50
2M
8.2K
#17
o1
o1
84.80
87.5
81.0
88.5
78.0
89.0
65.0
$15.00 / $60.00
200K
100K
#18
DeepSeek V3.1 Thinking
deepseek-reasoner
84.21
87.0
71.0
90.5
82.5
90.0
40.0
$0.07 / $1.68
128K
32.8K
#19
o3-mini
o3-mini
83.60
88.0
75.0
94.0
72.0
89.0
92.0
$1.10 / $4.40
200K
100K
#20
DeepSeek V3.2
deepseek-v3.2-exp
83.40
86.5
79.0
88.5
76.0
87.0
95.0
$0.07 / $0.14
160K
8.2K
#21
Grok 3
grok-3
82.40
83.5
82.0
86.0
76.0
84.5
68.0
$3.00 / $15.00
131.1K
8.2K
#22
Grok 4
grok-4-0709
82.21
86.5
82.0
89.0
65.5
88.0
70.0
$3.00 / $15.00
256K
8.2K
#23
DeepSeek R1
deepseek-reasoner
82.10
88.0
65.0
92.0
72.0
93.5
25.0
$0.55 / $2.19
128K
32.8K
#24
Llama 4 Maverick
llama-4-maverick-17b-1...
80.60
80.5
79.5
86.5
74.5
82.0
95.0
$0.20 / $0.80
1M
8.2K
#25
Grok 3 Mini
grok-3-mini
79.60
80.0
80.0
83.0
73.0
82.0
82.0
$0.30 / $0.50
131.1K
8.2K
Showing 25 of 31 models