Math
Mathematical problem solving
182 datasets
last indexed 17h ago
NameBest Model
ProverBenchDeepSeek-Prover-V2-671B (CoT)
GSM-Symbolico1-preview
MATHGPT-2 1.5B
ProcessBencho1-mini
Mathador-LMLlama-3-70B
IMO-BenchGemini Deep Think (IMO Gold)
FrontierMatho1-preview
MiniF2FGPT-f
OlympiadBenchGPT-4
MathVistaInternVL2-Pro
MATH-Perturbo1-mini
REASONEVALREASONEVAL (Llemma-34B)
Omni-MATHOpenAI o1-mini
OMEGADeepSeek-R1
MathConstructGPT-5
FormalMATHDeepSeek-V2-671B
QRDATAGPT-4
ME2Gemini 2.0 Flash
Math Reasoning BenchmarkGPT-4o
GSM-InfiniteDeepSeek-R1
OptiBenchGPT-4o
MathBookEvalQwen2.5-VL-72B
IneqMathGPT-5 (medium, 30K)
GSM-ICLTM + SC (code-davinci-002)
MGSMPaLM-540B
GSM-PlusHuman
RESTQwen-QwQ-32B
NUPAGPT-4o
LiveAoPSBenchDeepSeek-R1-Distill-Qwen-32B
FinanceMATHHuman Expert (Open-book)
MamoLlama-3.1-405B-instruct
OlymMATHGemini 2.5 Pro Exp 0325
CHASEGemini-1.5-Pro
PutnamBenchHilbert
MWPBENCHGPT-4
MATHCHECKO1-preview
ArithmAttackLlama3
Math-VRQwen3-VL-235B-A22B-Thinking
SciBenchGPT-4-Turbo
GeomVerseHuman
MathCanvas-BenchGemini-2.5-Pro
MCLMo3-Mini
We-MathGemini-2.5-Pro
GeoTrustOpenAI-o3
GSM1kgpt-4o
MathChatMistral-MathChat
DynaMathZero-shot Claude-3.5
Cultural GSM8KClaude 3.5 Sonnet
GeoSenseGemini 1.5 Pro
CMM-MathGPT-4o
Hard2VerifyGPT-5
MathArenaGPT-5 (HIGH)
DocMath-EvalGemini-1.5-Pro
COUNTERMATHDeepseek-R1
FATE-MREAL-Prover
Geometry3KInter-GPS
HARDMatho1-mini
MathOdysseyOpenAI o1
DeepMath-CreativeGPT O3-mini
REWARDMATHInternlm2-7b-reward
RealMatho3
ErrorRadarHuman
CHAMPGPT-4 Turbo
U-MATHGemini-1.5-Pro
NumberlandChatGPT o1
MM-MATHHuman
REASONZOOQwen3-235B-A22B
UniGeoGeoformer + Pretraining
SOLIDGEOHuman
CombiBenchKimina-Prover Preview
Libra BenchLibra-RM-32B-MATH
Scheherazadeo1-preview
TABMWPDocugami-MATATA-8B
Advanced Reasoning Benchmarkgpt-4-0314
GeointGeoint-R1
ModelingBenchHuman Expert
AMO-BenchGPT-5-Thinking (High)
SVAMPGraph2Tree-R
FERMATGEMINI-1.5-PRO
LeanGeoGemini 2.5 Pro
GeoGramBenchQwen3-235B-Thinking-2507
GeoEvalWizardMath-70B
MATP-BENCHOpenAI-o1
Ineq-CompDeepSeek-Prover-V2-7B
MathRealDeepSeek-R1
GeomRelGPT-4o
PROOFBENCHO3
GPSM4KGPT-4
RV-BENCHo3-mini
FinanceReasoningOpenAI o1
StepMathBencho1-mini
UGMathBenchOpenAI-o1-mini-2024-09-12
ReliableMathDeepSeek-R1
MathGAPDeepSeek-R1
HARPo1 mini
VisionGraphGPT-4V (DPR) w/ Python
Extended Grade-School MathClaude-3-opus
Formal Problem-Solving BenchmarksInternLM2.5-StepProver
ConsistencyCheckReForm-32B
UTMatho1-mini
MMATHo3-mini
CMATHGPT-4
FineMathGPT-4
PutnamGAPgpt-4o-mini
Unreasonable Math ProblemsGrok3-Reasoning
TriMaster100SSC-CoT (GPT-3.5)
MathHayGemini-1.5-Pro-002
FINEREASONo1
TreeCuto3-mini
PMCQwen2.5 3B
MATH 401gpt-4
SceMQAGPT4-V (Zero-shot)
FormulaReasoningHuman
EasyMathAceMath-1.5B
GSM-Agento3
AgentCoMaLlama3.3 70B Instruct
MATH-BeyondQwen3-8B
JEEBENCHGPT-4 + CoT + SC@8
BrokenMathGPT-5
CogMathDeepSeek-R1
MMSciBenchGemini 1.5 Pro 002
LILACodex (code-davinci-002)
Invalsi Benchmarksllama 3.1 70b instruct
MME-SCIDoubao-Seed-1.6
ConceptMathGPT-4
CREATIVEMATHGemini-1.5-Pro
GSM-MCQwen3-8B
NumGLUEHuman
Spoken-MQAWhisper-Qwen2.5-Math-7B-Instruct
StatEvalGPT-5
SMARTo3
PATCH!GPT-4V
RIMODeepSeek-R1-671B
MathMistGPT-OSS 20B
ASDivGTS
Basic Math BenchmarkLlama-2-13b
EvolMathEvalDeepSeek-R1
TRIGOGPT-2_L-PACT-E
FMCDEEPSEEK-R1
VisAidMathGPT-4V
KMathGPT-4
MorphoBencho3
SciDAGemini-2.5-pro.preview.0506.google.ci
IsarStepHAT
BeyondXGPT-4
SuperCLUE-Math6GPT-4-1106-Preview
AVI-MATHAVI-Math (Ours)
MATH-StructQwen-2-7B
FAULTYMATHGemini-1.5-Pro
ASyMOBGemini-2.5 Flash (no code)
Putnam-AXIOMo1-preview
Kangaroo Language TestGemini 2.0 Flash
Mathematical Topics Treeo1-mini
MMTutorBenchGemini-2.5-Pro
AI4Matho3 mini
Combi-PuzzlesGPT-4
CM17KNS-Solver
UAV-Math-Bencho1
FATEo3
RoMathMathstral-7b-v0.1 (0-shot)
NOAHQAHuman
DOoMGemini 2.5 Pro
SKYLENAGEGPT-5-20250807
AlgGeoTestGemini 2.5 Pro
IntegralBenchQwen3-235B-A22B
InterMWPLogicSolver
Guji_MATHDeepSeek R1
Machine Number SenseResNet
ORCAGemini 2.5 Flash
MAVENCoreThink/openai/gpt-oss-120b
EffiReason-BenchTokenSkip
DynaSolidGeoQwen3-VL-30B-A3B-Thinking
MatheMagicGPT-5
EQUATEQ-REAS
MathRobust-LVGPT-5
MathBodeDeepSeek V3.1
ExtremBenchQwen3-4B-Thinking-2507
GanitBenchGPT-4o mini (Zero-shot CoT)
PuzzleCloneChatGPT-o3
StreetMathQwen3-4B-Instruct
BanglaMATHDeepSeek-V3
MathSticksHuman