State of the Art/Multimodal Understanding
Multimodal Understanding
Image and video comprehension
501 datasets
last indexed 17h ago
Model Leaderboard
No.Model
Score
1
Google logoGemini 3 Pro
1.00
2
OpenAI logoo3 (high)
1.00
3
OpenAI logoGPT-5 (high)
0.94
4
OpenAI logoo3
0.90
5
OpenAI logoGPT-5.1
0.90
6
Grok 3
0.57
7
Anthropic logoClaude Sonnet 4.5
0.56
8
Google logoGemini 2.5 Pro
0.54
9
Anthropic logoClaude 3.7 Sonnet
0.54
10
OpenAI logoo1
0.54
11
OpenAI logoGPT-4.1 Mini
0.53
12
Anthropic logoClaude Opus 4.1
0.51
13
Anthropic logoClaude Opus 4
0.48
14
OpenAI logoo4-mini (high)
0.48
15
OpenAI logoGPT-4.1
0.40
16
Z.AI: GLM 4.5V
0.40
17
Anthropic logoClaude Sonnet 4
0.36
18
Anthropic logoClaude 3.5 Sonnet
0.28
19
Pixtral 12B
0.23
20
Google logoGemini 2.5 Flash
0.11
21
OpenAI logoGPT-4o
0.11
22
Llama 4 Maverick
0.02
23
Anthropic logoClaude 3 Sonnet
0.01
24
OpenAI logoGPT-4o-mini
0.00
25
Anthropic logoClaude 3 Haiku
0.00
NameBest Model
MMMUGPT-5.1
MMMU-ProGemini 3.0 Pro
CharXivo3 (high)
MM-BrowseCompo3
Video-MMMUGPT-5-thinking
ActiViewQwen2.5-VL-7B
VSI-BenchGemini-1.5 Pro
All-Angles BenchQwen2.5-VL-72B
FG-BMKGemini-2.0-flash
VRC-BenchGPT-4o
MVBenchVideoChat2 (Mistral-7B)
VisuLogicHuman
MMLongBenchGemini-2.5-Pro
R1-Onevision-BenchGemini-2.0-Flash
Multimodal Visual PatternsHuman
VisDoMBenchGPT-4o
VisualWebArenaHuman Performance
VCR-Bencho1
BrowseComp-VLWebWatcher-32B
VMBVideoMindPalace
MMIEGPT-4o | SD-XL
MME UnifyGemini2.0-flash-exp
MMBenchInternLM-XComposer2
ScreenSpot-ProHolo2-30B-A3B
EgoTaskQAHCRN
EMMAgemini-2.0-flash-thinking-exp-01-21
DisCRnX-Instruct Proj. (13b)
MMEB-V2VLM2Vec-V2 (2B)
MathVerseGPT-4V
MINERVAHuman performance
RH-BenchOcean-R1-7B
AutoBench-VClaude-3.5-Sonnet
VCBenchHuman
LENSGemini2.5-Pro
VisualAgentBenchgpt-4o-2024-05-13
LVBenchGemini-2.5-Pro
HallusionBenchGPT-4V (Sep 2023)
MotionBenchQwen2VL-72B
GSR-BENCHLLAVA-NEXT-YI 34B
CoreCognitionGPT-o1
VL-ICL BenchLLaVA-OneVision-72B
Spatial457Human
SpatialEvalGPT-4o (Vision-text)
SOK-BenchGPT4v
OmniBenchHuman Evaluator
DocVQAbert-large-squad
Perception TestHuman
DriveLMM-o1DriveLMM-o1
VL-RewardBenchGemini-1.5-Pro
MMIUGPT-4o
MM-VetGPT-4V
ViP-BenchGPT-4V-turbo-detail:high
MMStarGPT4V (high)
VLM2-BenchClaude-3.7-sonnet
FAVOR-BenchGemini-1.5-Pro
MMFakeBenchGPT-4V
TreeBencho3-0416
MM-EscapeHuman
M3CoTHuman
MovieChat-1KMovieChat
MC-BenchHumans
SpatialScoreInternVL 3 78B
CoMTGemini-Pro
LEGO-PuzzlesGPT-4o
VQAsnubi-naverlabs
3D-CoTDeepSeek-R1-Distill-Llama-8B (Unmarked CoT)
SEED-BenchInstructBLIP Vicuna
MMSI-BenchHuman Level
Video-HolmesGemini-2.5-Pro
SELF-BENCHSD 3-m (discffusion)
MuirBenchHuman
MMVUHuman Oracle
RealUnifyNano Banana
ColorBenchGPT-o3 (API)
NExT-GQAHuman
ChartXChartVLM-L
VLRMBenchGemini-2.0-Flash
MARS-BenchQwenAD-SFT-RFT (7B)
UrbanVideo-BenchQwen-VL-Max-latest[32f]
3DSRBenchLLaVA-NeXT-8B
MM-Vet v2Claude 3.5 Sonnet
VERIFYOpenAI-o1
ViVerBenchHuman
REASONMAPOpenAI o3
FakeBenchGPT-4V
MangaVQAMangaLMM
Multimodal Multi-image Reasoning BenchmarkGPT-o1-20241217
ALM-benchGPT-4o
WorldSenseGemini 2.5 Pro Adaptive-Thinking
MEGA-BenchClaude-3.5-Sonnet (1022)
Open3D-VQAQwen2-VL-7B (finetuned)
VLM4DHuman Performance
MV-MATHSeed1.5-VL (thinking)
M²RAGQwen2.5-72B-Instruct
VisualPuzzleso4-mini
MIRAGPT-5
IntentBenchHumanOmniV2
VQA v2.0MCB
MuMA-ToMHuman
IV-BenchQwen2.5-VL-72B
LogicVistaLLaVA-NeXT-Nous-Hermes-Yi-34B
Agent-XOpenAI o4-mini
M-DocSum-BenchM-DocSum (7B)
SHIELDmPLUG-owl
RBench-VHuman Experts Score
EmoBenchEmoLLM
EgoMemVideoLucy
H2VUGemini-1.5-Pro
MME-CoFSora-2
GQAHumans
MMT-BenchGPT4o
GSM8K-VHuman
MMBench-VideoGPT-4o-[1fps]
MIRBGPT-4V
Scientists' First ExamGPT-o3
MMVMCoLVA-Qwen2VL-7B (Ours)
MIBenchGPT-4o
Q-Bench-VideoGPT-4o
MMToM-QAHuman
VisNumBenchHuman
Video-MMLUClaude-3.5-sonnet
IllusionVQAHuman
MARBLEGPT-o3
VideoEval-Progemini-2.5-pro
HALLUCINOGENQwen2-VL
STARClipBERT
PhysToolBenchHUMAN(BEST)
R-BenchOpenAI GPT-5
VOILAHuman
ChartBenchGPT-4O
MMCRGPT-4o
ALLVBClaude 3.5 Sonnet
Face Forgery Detection VQA BenchmarkLlaVa-1.5
VSPGPT-4o
JudgeAnythingGemini-1.5-Pro
EgoIntentionMiniGPT-v2
AVHBenchAVHModel-Align-FT
FineCops-RefCogVLM†
MathScapeHuman
POPVQALLaVA34B
Needle In A Multimodal HaystackHuman
GRASPHuman Subjects
ViTextVQAViTextBLIP-2 (base)
ICQTR-DETR
MultiChartQAHuman
H-POPELLaVa
SITEHuman
CLEVRERNS-DR
CulturalVQAGPT-4
RTV-BenchGPT-4o
SEED-Bench-2SEED-LLaMA
MDK12-BenchGemini2-thinking
M4UGPT-4o
OCR-ReasoningDouBao-1.5-Vision-Pro
InfiniBenchGPT-4o
OST-BenchHuman-Level
SAV-CaptionVoCap
M2KRRA-VQAv2 w/ PreFLMR
VAGUEHuman
Video-BenchVideo-ChatGPT
CMMMUGPT-4o(202405130)
SEED-Bench-2-PlusGPT-4V
MSQALEO (FT)
SpatialViz-BenchGemini-2.5-pro
SciVideoBenchGemini-2.5-Pro
CONFLICTVISGPT-4o
HLV-1KLLaVA-Video (72B)
Uni-MMMUGPT4.1 + GPT-image
MuCRHuman
SIV-BenchGemini-2.5-Pro
MMReasonGPT-4o-1120
ArtifactsBenchGPT-5
TDBenchGemini 2.5 Pro
MAVERIXClaude 3.5 Sonnet
MMR-VHuman
MMSearch-Pluso3
GeoQAHuman (Text-Diagram)
FigureQARelation Network (RN)
VS-Bencho3
Multimodal Inconsistency Reasoningo1 (1217)
SciVerseGPT-4o
VIPVICUNA-13B
Vision LLM Safety BenchmarkGPT-4V
FCMRClaude 3.5 Sonnet
MMWorldGPT-4o
HumanVBenchHuman
VideoReasonBenchHuman
Visual Commonsense ReasoningHuman Performance
CLEVRHuman
ViC-Bencho3
OK-VQAProphet
SPORTUQwen2-VL-72B
MVTamperBenchVILA1.5-40b
VideoMathQAGPT-o4-mini
Spatial-MMGPT-4o
WikiMixQAHuman Experts
BLINK-TwiceGemini-2.5-pro ✩
UNPIEGPT-4V
CompreCapHuman
TIR-Bencho3-TU
DrVD-BenchGemini 2.5 Pro
MMLU-ReasonGemini-2.5 Pro
VisualToolBenchGPT-5-think
ActivityNet-QAE-SA
ROVERNano Banana
CrossWordBenchDeepSeek-R1
VidCompositionGemini-2.5-flash-preview
HumaniBenchQwen2.5-7B
LLMGeoGemini
EgoExoBenchGemini 2.5 Pro
II-BenchQwen-VL-MAX
DIVEGRT
A-BenchHUMAN (BEST)
IconQAHuman
11Plus-BenchGPT-o3
AVA-BenchSigLIP-2
MaRs-VQAHuman
UrBenchHuman
MUCARHuman
Causal-VidQAHuman
Reasoning-OCRGPT-4o-20240806
MMEvalProHuman (Graduate Student)
OmniVideoBenchGemini-2.5-Pro
PSGPSGTR (60 epochs)
VISFACTORClaude 3.7 Sonnet
FaceXBenchQwen2-VL-72b-Instruct
REVERIEHuman
Argus InspectionGPT-4.1-2025-04-14
VFaith-BenchGemini-2.5
MMCompositionInternVL2-40B
UNO-BenchGemini-2.5-Pro
LIMEInternVL-2 (40B)
FlowVQAGPT-4V
Social GenomeGemini-1.5-Flash
MIHBenchLLaVA-NeXT-Interleave + OURS (DAB)
MVU-EvalGemini 2.5 Pro
HumanPCRo3
SeriesBenchGPT-4o + PC-DCoT
LVLM-eHubInstructBLIP
LogicOCRGemini-2.5-Pro
LVLM-PlaygroundGPT-4o
SMIR-BENCHClaude-3-Opus-20240229
ChartMindGPT-4o
CUREHuman
Ego-QA and MAD-QAHuman
Q-Bench+BlueImage-GPT
Visual7WHuman (Question + Image)
EXAMS-VGPT-4 (w/ OCR, captions)
VNBenchGemini 1.5 Pro
MCUBDAMC
BlackSwanSuiteHuman
PRISM-BenchSkyWork R1V3-38B
TIMEBenchInternVL 2.5 + TIME
TempVSInternVL2.5 78B-MPO
MixEval-XClaude 3.5 Sonnet
OmniEvalgemini-2.5-pro-preview-05-06
Multi-image Relational AssociationGPT4o
EgoGazeVQAQwen2.5-VL-72B
MIRAGEQwenVL-2.5-72B
HumanSenseInternVL3-8B
HEMMGEMINI
AVUTGemini 1.5 Pro
MACStep-3
V2P-BenchHuman Performance
ReMIHuman
CompBenchGPT-4o
VisualTranso3
CMMUGPT-4V
MATEHuman
GDI-BenchGDI-Model
EEmo-BenchGPT-4o
LogicBenchHuman
POLYMATHClaude-3.5 Sonnet
CVBenchGPT-4o
VisioMathGemini 2.5 Pro
PhysicsArenaQwen-VL-Max
VideoAutoArenaGPT-4o
MMDocBenchGPT-4o
HIS-BenchHIS-GPT
oVQABLIP-2 OPT
VideoGLUECoCa
MMPerspectiveGemini-2-flash (CoT)
Event-BenchGPT-4o
MATHLENSGemini-2.5-Flash (Thinking)
HVSBenchHuman
MementosGPT-4V (Sequential)
BabelBenchChatGPT 4
MMRelGPT-4o
TimeLogicSeViLA
P²GBGPT-4V
PCA-BenchGPT4-Vision-1106
Corrupted Visual GenomeHiKER-SGG
NeMoBenchHuman Experts
CameraBenchGemini 2.0 Flash
AlignMMBenchQwen2-VL
Earth Observation VLM BenchmarkGPT-4V
IRRLLaVA-NeXT (Mistral-7B)
SIRI-BenchDoubao-1.5-pro (Textual Rep.)
ASCIIEvalGPT-5
Video Reasoning Evaluation SuiteHuman
Face-Human-BenchInternVL-Chat-v1.2-Plus
Human-MMEGLM-4.5V
CogBenchOracle
ReadBenchQwen2.5-VL 7B
WebQuestGPT-4V
JourneyBenchGPT-4o
VQA-CPGVQA
VidTextHuman
HumanVideo-MMEQwen2.5-VL 32B
AbilityLensQwen2.5VL-72b
NTSEBENCHOpenAI o1-preview
MUSIC-AVQAOur method (Spatio-Temporal Grounding)
SNS-Bench-VLGemini-2.5-pro-exp-03-25
NPHardEval4VGPT-4V
Clue-Visual QAQvq-72B-preview
CODISHuman
BEAFShikra (7B)
Dr.V-BenchHuman
TEMROBBENCHmPLUG-Owl3
Compositional Visual RelationsResNet-50
MESHInternVL2.5-78B
MMTABQAGPT-4o
COVERInternVL2.5-78B
SportRQwen-VL-7B (SFT+RL)
ARBGPT-4o
TUNAGPT-4o (0806)
VL-CheckListViLT
VideoRewardBenchGemini-2.5-Pro (2025-06)
Illusory VQAHuman
MT-Video-BenchGemini 2.5 Pro
GlitchBenchGPT-4V
CAMEL-BenchGPT-4o
ImplicitQAHuman Baseline
M3STRQwen2.5-VL 72B-Instruct
ViLBiasRoBERTa + CLIP (FT)
TEMPOMLLC (Ours) Context Sup. Test
MIRQwen2-VL
Video-OCR BenchQwen2-VL-7B
AVEBFAVOR 13B (audio-visual)
GePBenchHuman
REBUSGPT-4o
UAL-BenchVideoLLaMA
MangaUBGPT-4o
ING-VPClaude-3.5 Sonnet
VideoVista-CulturalLingoGemini-2.0-Flash
CSVQAo1-preview
AVTrustBenchGPT-4o
YouCookII-TVSGPT-4-turbo
Med-MIMMed-Mantis
EgoCVRTFR-CVR (Ours)
VidSituHuman
OV-VGOurs (Swin-T, O365+, RefC finetune)
AURAOla
VALUEcraig.starr (ensemble)
ODI-Bencho3
MM-SpuBenchGPT-4V
NoTeS-BankHuman Baseline
FinMMRClaude 3.7 Sonnet (64K)
PACSHuman
MultiVENT-GGPT-4o
MFC-BenchHuman
MMRClaude 3.5 Sonnet
MAIAQwen2.5-VL-72B
MOMENTSLLaVA-Video-72B
VS-TDXGPT-4o
MERLIMMiniGPT-4 (Vicuna-7B v0)
XD-ViolenceVA-GPT (Vicuna-7B)
VGSIHuman
MTMEUROur Method
MoHoBenchLlama-3.2-90B-Vision-Instruct
Multi-Dimensional InsightsGPT-4o
MAPWiseHuman
VisScienceClaude3.5-Sonnet
BenchLMMGPT-4V
SherlockCLIP (RN50x64) + multitask clue learning
OBI-BenchGPT-4O (ver. 0806)
VER-BenchGemini-2.5 Pro Preview
IQBencho4-mini
TennisTVGPT-4.1
Charting New TerritoriesGPT-4V
O-BenchHuman
KnowDR-RECQwen-VL-Max
Cops-RefMattNet-Mine
CausalVLBenchGemini-2.0-Flash
BloomVQAGPT-4V
EgoSDQESLaViLa + QR-Adapter
MultiStARQwen2-VL-72B*
ViLMABLIP-2
VALSEViLBERT 12-in-1
MR²-BenchSeed-1.6-Embedding
VQA-RephrasingsBAN + CC
VISEInternVL 2.5 26B
MM-BigBenchInstructBLIP
QL-BenchInternLM-XComposer2d5 (7B)
MM-InstructEvalGPT-4V
MetaCLUEFT CLIP (ViT-L/14)
MMRobustnessCLIP ZS
MathOPEvalHuman
TouchStoneGPT-4V
MET-BenchGPT-4o
DesignProbeGPT-4V
IQUAD V1Human
iWISDMHuman
MSRVTT-PFIT (zs)
FRAMES-VQASPD LoRA
AccidentBenchGPT 5
Moments-OVREOurs (CLIP-ViT + GPT-2)
Contra4CREMA
MaRVL-QAo4-mini
CRICViLBERT+l_att
Bias in the PictureQwen2.5-VL
MOSABenchmPLUG-owl-7B
Multi-PhysicsGemini-2.5-Pro
AstroChartGemini-2.5-Pro
UI2V-BenchHailuo
JRDB-ReasoningInternVL 2.5
EasyARCClaude 3.7 Sonnet
UWBenchGPT-5
IMAGECODEHuman Performance
SGG BenchmarkRelDN
ReForm-EvalBLIP-2_F
PARROT-360VGPT-4o
EGOILLUSIONHuman Evaluation
MANBenchHuman (Best)
xGQAmBERT_Ada
SEAMGPT-5-mini
ChEFGPT-4V
LAVALava
Trust-videoLLMsClaude 3.7 Sonnet
MME-CCHuman
GOBenchGemini-2.5Pro
GUESSBENCHQwen2.5-VL-72B
CLEVR Mental Rotation TestsUpper bound (canonical views only)
ConViS-BenchLLaVA-OV-7B
Common-O BenchLlama 4 Instruct Scout
Compass Direction ReasoningGemini 1.5 Pro
CLEVR-Ref+IEP-Ref (700K prog.)
NEMOHuman
ReasonBenchHuman Baseline
MeViS-XPlanner-Refiner
IGLUEUC₂
Spacewalk-18Caption-enhanced LLM
VisualQuestGemini-2.0-Flash-exp
MEBenchCogVLM
ST-VQAVTA
VLUEMETER
CompareBenchGemini 2.5 Pro
gCOGSSTfmr
MedVidCQACCGS (Our Method)
Seeing Culture BenchmarkGPT-o3
LongInsightBenchGemini2.5-Flash
Compositional Temporal GroundingVISA
VLQAHuman
SE-CEGPT-4.1 Mini
ComBoViT-B/16 (fine-tuned)
MultipanelVQAHuman
How2RHERO
MMA-ASIAGPT-4o
PARCInternVL2 40B
WildQAHuman
R²-BenchSEEM
MCTBenchGPT-4V
VQA-GENViLT
VQA-LOLLOL (full)
ProbeBridgeTower
SHOP-VRBXNM GT/GT
HumanCogHumanCog (Ours large)
VIVA+GPT-4.1
M-EV²MEEL
G-VUEViT-16-CLIP
Visual GenomeEM-Grounding (None)
VLM@schoolQwenVL2.5 32B
Fine-Grained Image Analysis Benchmarkclaude-3-5-sonnet-20241022
INTERCHARTGemini-1.5-Pro
RPTS-EvalGPT-4o
CFVBenchgemini-2.5-flash
V-HUBQwen-2.5-VL-72B
AeroEye-v1.0THYME
SPOT ProberInternVideo
VL-GLUEViLT (Fine-tuned)
SPOTALPRO
ImageNetVCLLaMA-65B
VQArt-BenchGemini 2.5
SNAREBLIP
SIMMCSimpleTOD+MM
Res-BenchmPLUG-Owl3
EmoBench-Redditgemini-2.5-pro
PerceptualQAHuman Baseline
CMR-SPBGemini 2.0 Flash
IndicVisionBenchGemini-2.5 Flash
VLUResHuman Performance
ViMoNet-BenchViMoNet
ISO-BenchHuman*
BLEnD-VisGPT-4o
Drill-downDrill-down_3x256
MMAO-BenchGemini-2.5-Pro
TrUMAnOracle (BERT)
PISA-BenchGPT-4o
HVQRKM-net
YouCookIIDORi
YouMakeup VQA ChallengeSCDM+ (I3D features)