15/04/2025
30/04/2025
012345678
21 problems from 17 repositories selected within the current time window.
You can adjust time window, modifying the problems' release start and end dates. Evaluations highlighted in red may be potentially contaminated, meaning they include tasks that were created before the model's release date.
Rank
Model
Resolved Rate (%)
Resolved Rate SEM (±)
pass@5 (%)
1
gpt-4.1-2025-04-14
16.2%1.17%23.8%
2
DeepSeek-V3-0324
13.3%3.16%23.8%
3
DeepSeek-V3
11.4%1.90%14.3%
4
Qwen3-235B-A22B no-thinking
10.5%1.78%14.3%
5
Qwen3-32B no-thinking
10.5%0.95%14.3%
6
Qwen3-32B thinking
9.5%0.00%9.5%
7
Qwen3-235B-A22B thinking
8.6%1.76%19.0%
8
Llama-4-Maverick-17B-128E-Instruct
7.6%1.90%14.3%
9
Llama-3.3-70B-Instruct
7.6%1.17%14.3%
10
Llama-4-Scout-17B-16E-Instruct
4.8%2.13%14.3%
11
Qwen2.5-72B-Instruct
3.8%0.95%4.8%
12
gemma-3-27b-it
3.8%0.95%9.5%
13
Qwen2.5-Coder-32B-Instruct
1.0%0.95%4.8%