Leaderboard

Explore how AI models perform across our five core evaluation categories. Rankings are based on real-world conversations and human evaluations, measuring what truly matters in an AI assistant.

Model
#1
Gemini 3 Pro
gemini-3-pro-preview
92.30
93.3
91.3
94.0
88.7
94.3
60.0
$2.00 / $12.00
2M
65.5K
#2
GPT 5.2
gpt-5.2
91.59
93.5
90.7
96.5
82.0
95.3
75.0
$1.75 / $14.00
400K
128K
#3
Claude Sonnet 4.5
claude-sonnet-4-5
91.22
91.5
88.8
94.0
89.5
92.3
68.5
$3.00 / $15.00
200K
64K
#4
GPT 5.1
gpt-5.1
90.92
92.5
90.5
95.0
82.1
94.5
80.0
$1.25 / $10.00
400K
128K
#5
Claude Opus 4.5
claude-opus-4-5-202511...
90.38
90.5
90.7
95.3
81.6
93.8
60.0
$5.00 / $25.00
200K
64K
#6
Grok 4.1 Thinking
grok-4.1-thinking
89.51
91.5
89.3
94.0
79.8
93.0
62.0
$3.00 / $15.00
256K
8.2K
#7
GPT 5
gpt-5
89.29
91.0
87.7
94.0
80.3
93.5
72.0
$1.25 / $10.00
400K
128K
#8
o3
o3-2025-04-16
88.84
89.5
90.0
93.2
77.5
94.0
58.0
$2.00 / $8.00
200K
100K
#9
Grok 4.1
grok-4.1
88.63
90.5
89.9
92.5
78.3
92.0
78.0
$3.00 / $15.00
256K
8.2K
#10
Claude Opus 4.1
claude-opus-4-1
88.30
90.0
83.5
92.0
85.5
90.5
54.0
$15.00 / $75.00
200K
8.2K
#11
ChatGPT 4o
chatgpt-4o-latest
87.30
88.5
90.5
89.0
81.5
87.0
82.2
$5.00 / $15.00
128K
16.4K
#12
Claude Haiku 4.5
claude-haiku-4-5
86.80
87.5
82.0
90.0
86.5
88.0
93.0
$1.00 / $5.00
200K
64K
#13
Grok 4 Fast Reasoning
grok-4-fast-reasoning
86.70
89.0
86.5
91.5
76.0
90.5
88.0
$0.20 / $0.50
2M
8.2K
#14
Gemini 2.5 Pro
gemini-2.5-pro
86.47
88.3
85.3
91.3
78.3
89.2
60.4
$2.00 / $12.00
1M
65.5K
#15
o4-mini
o4-mini
86.30
89.5
84.5
94.5
72.0
91.0
95.0
$1.10 / $4.40
200K
100K
#16
DeepSeek V3.2 Thinking
deepseek-reasoner-v3.2
86.03
89.0
74.0
93.5
79.7
94.0
45.0
$0.14 / $0.28
160K
32.8K
#17
GPT-5 Mini
gpt-5-mini
85.80
86.0
83.0
91.0
81.0
88.0
94.0
$0.25 / $2.00
400K
128K
#18
Grok 4 Fast
grok-4-fast-non-reason...
85.20
87.0
86.0
89.0
75.5
88.5
93.0
$0.20 / $0.50
2M
8.2K
#19
o1
o1
84.80
87.5
81.0
88.5
78.0
89.0
65.0
$15.00 / $60.00
200K
100K
#20
DeepSeek V3.1 Thinking
deepseek-reasoner
84.21
87.0
71.0
90.5
82.5
90.0
40.0
$0.07 / $1.68
128K
32.8K
#21
o3-mini
o3-mini
83.60
88.0
75.0
94.0
72.0
89.0
92.0
$1.10 / $4.40
200K
100K
#22
Gemini 3 Flash
gemini-3-flash-preview
83.46
88.5
76.0
90.4
72.0
90.4
95.0
$0.50 / $3.00
1M
65.5K
#23
DeepSeek V3.2
deepseek-v3.2-exp
83.40
86.5
79.0
88.5
76.0
87.0
95.0
$0.07 / $0.14
160K
8.2K
#24
Grok 3
grok-3
82.40
83.5
82.0
86.0
76.0
84.5
68.0
$3.00 / $15.00
131.1K
8.2K
#25
Grok 4
grok-4-0709
82.21
86.5
82.0
89.0
65.5
88.0
70.0
$3.00 / $15.00
256K
8.2K
Showing 25 of 39 models