| No. | Model | Score |
|---|---|---|
| 1 | Gemini 3 Pro | 1.00 |
| 2 | o3 (high) | 1.00 |
| 3 | GPT-5 (high) | 0.94 |
| 4 | o3 | 0.90 |
| 5 | GPT-5.1 | 0.90 |
| 6 | Grok 3 | 0.57 |
| 7 | 0.56 | |
| 8 | Gemini 2.5 Pro | 0.54 |
| 9 | 0.54 | |
| 10 | o1 | 0.54 |
| 11 | GPT-4.1 Mini | 0.53 |
| 12 | 0.51 | |
| 13 | 0.48 | |
| 14 | o4-mini (high) | 0.48 |
| 15 | GPT-4.1 | 0.40 |
| 16 | Z.AI: GLM 4.5V | 0.40 |
| 17 | 0.36 | |
| 18 | 0.28 | |
| 19 | Pixtral 12B | 0.23 |
| 20 | Gemini 2.5 Flash | 0.11 |
| 21 | GPT-4o | 0.11 |
| 22 | Llama 4 Maverick | 0.02 |
| 23 | 0.01 | |
| 24 | GPT-4o-mini | 0.00 |
| 25 | 0.00 |
| Name | Organization | Best Model |
|---|---|---|
| MMMU | University of Victoria | GPT-5.1 |
| MMMU-Pro | Carnegie Mellon University | Gemini 3.0 Pro |
| CharXiv | University of Wisconsin | o3 (high) |
| MM-BrowseComp | Nanjing University | o3 |
| Video-MMMU | Carnegie Mellon University | GPT-5-thinking |
| ActiView | Fudan University | Qwen2.5-VL-7B |
| VSI-Bench | New York University | Gemini-1.5 Pro |
| All-Angles Bench | New York University | Qwen2.5-VL-72B |
| FG-BMK | University of Copenhagen | Gemini-2.0-flash |
| VRC-Bench | Australian National University | GPT-4o |
| MVBench | Chinese Academy of Sciences | VideoChat2 (Mistral-7B) |
| VisuLogic | University of Science and Technology of China | Human |
| MMLongBench | University of Edinburgh | Gemini-2.5-Pro |
| R1-Onevision-Bench | Zhejiang University | Gemini-2.0-Flash |
| Multimodal Visual Patterns | New York University | Human |
| VisDoMBench | University of Maryland, College Park | GPT-4o |
| VisualWebArena | Carnegie Mellon University | Human Performance |
| VCR-Bench | Huawei Noah’s Ark Lab | o1 |
| BrowseComp-VL | Alibaba Group | WebWatcher-32B |
| VMB | University of Illinois at Urbana-Champaign | VideoMindPalace |
| MMIE | University of Chicago | GPT-4o | SD-XL |
| MME Unify | CASIA | Gemini2.0-flash-exp |
| MMBench | Shanghai AI Laboratory | InternLM-XComposer2 |
| ScreenSpot-Pro | National University of Singapore | Holo2-30B-A3B |
| EgoTaskQA | Tsinghua University | HCRN |
| EMMA | University of Washington | gemini-2.0-flash-thinking-exp-01-21 |
| DisCRn | Stanford University | X-Instruct Proj. (13b) |
| MMEB-V2 | University of Waterloo | VLM2Vec-V2 (2B) |
| MathVerse | UCLA | GPT-4V |
| MINERVA | Google DeepMind | Human performance |
| RH-Bench | UC Santa Barbara | Ocean-R1-7B |
| AutoBench-V | University of Notre Dame | Claude-3.5-Sonnet |
| VCBench | Zhejiang University | Human |
| LENS | Wuhan University of Technology | Gemini2.5-Pro |
| VisualAgentBench | Tsinghua University | gpt-4o-2024-05-13 |
| LVBench | Tsinghua University | Gemini-2.5-Pro |
| HallusionBench | University of Maryland, College Park | GPT-4V (Sep 2023) |
| MotionBench | Tsinghua University | Qwen2VL-72B |
| GSR-BENCH | George Mason University | LLAVA-NEXT-YI 34B |
| CoreCognition | Harvard University | GPT-o1 |
| VL-ICL Bench | University of Edinburgh | LLaVA-OneVision-72B |
| Spatial457 | Johns Hopkins University | Human |
| SpatialEval | University of Wisconsin–Madison | GPT-4o (Vision-text) |
| SOK-Bench | Tsinghua University | GPT4v |
| OmniBench | University of Manchester | Human Evaluator |
| DocVQA | Amazon | bert-large-squad |
| Perception Test | Google DeepMind | Human |
| DriveLMM-o1 | Mohamed bin Zayed University of Artificial Intelligence | DriveLMM-o1 |
| VL-RewardBench | HKU | Gemini-1.5-Pro |
| MMIU | Shanghai AI Laboratory | GPT-4o |
| MM-Vet | National University of Singapore | GPT-4V |
| ViP-Bench | Cruise LLC | GPT-4V-turbo-detail:high |
| MMStar | Shanghai AI Laboratory | GPT4V (high) |
| VLM2-Bench | CMU | Claude-3.7-sonnet |
| FAVOR-Bench | Fudan University | Gemini-1.5-Pro |
| MMFakeBench | University of California, Santa Barbara | GPT-4V |
| TreeBench | ByteDance | o3-0416 |
| MM-Escape | Fudan University | Human |
| M3CoT | Shanghai AI Laboratory | Human |
| MovieChat-1K | University of Washington | MovieChat |
| MC-Bench | Zhejiang University | Humans |
| SpatialScore | Tianjin University | InternVL 3 78B |
| CoMT | National University of Singapore | Gemini-Pro |
| LEGO-Puzzles | Shanghai AI Laboratory | GPT-4o |
| VQA | Microsoft | snubi-naverlabs |
| 3D-CoT | The Hong Kong Polytechnic University | DeepSeek-R1-Distill-Llama-8B (Unmarked CoT) |
| SEED-Bench | Tencent AI Lab | InstructBLIP Vicuna |
| MMSI-Bench | Shanghai AI Laboratory | Human Level |
| Video-Holmes | City University of Hong Kong | Gemini-2.5-Pro |
| SELF-BENCH | TU Darmstadt | SD 3-m (discffusion) |
| MuirBench | UCLA | Human |
| MMVU | Yale University | Human Oracle |
| RealUnify | National University of Singapore | Nano Banana |
| ColorBench | University of Maryland, College Park | GPT-o3 (API) |
| NExT-GQA | National University of Singapore | Human |
| ChartX | Shanghai Artificial Intelligence Laboratory | ChartVLM-L |
| VLRMBench | Shanghai Jiao Tong University | Gemini-2.0-Flash |
| MARS-Bench | Fudan University | QwenAD-SFT-RFT (7B) |
| UrbanVideo-Bench | Tsinghua University | Qwen-VL-Max-latest[32f] |
| 3DSRBench | Carnegie Mellon University | LLaVA-NeXT-8B |
| MM-Vet v2 | National University of Singapore | Claude 3.5 Sonnet |
| VERIFY | University of Rochester | OpenAI-o1 |
| ViVerBench | Tsinghua University | Human |
| REASONMAP | National University of Singapore | OpenAI o3 |
| FakeBench | University of Pittsburgh | GPT-4V |
| MangaVQA | the University of Tokyo | MangaLMM |
| Multimodal Multi-image Reasoning Benchmark | NUS | GPT-o1-20241217 |
| ALM-bench | Amazon | GPT-4o |
| WorldSense | Shanghai Jiao Tong University | Gemini 2.5 Pro Adaptive-Thinking |
| MEGA-Bench | University of Waterloo | Claude-3.5-Sonnet (1022) |
| Open3D-VQA | Tsinghua University | Qwen2-VL-7B (finetuned) |
| VLM4D | UCLA | Human Performance |
| MV-MATH | Chinese Academy of Sciences | Seed1.5-VL (thinking) |
| M²RAG | Nanyang Technological University | Qwen2.5-72B-Instruct |
| VisualPuzzles | Carnegie Mellon University | o4-mini |
| MIRA | Stanford University | GPT-5 |
| IntentBench | Alibaba Group | HumanOmniV2 |
| VQA v2.0 | Georgia Institute of Technology | MCB |
| MuMA-ToM | Johns Hopkins University | Human |
| IV-Bench | ByteDance | Qwen2.5-VL-72B |
| LogicVista | UCLA | LLaVA-NeXT-Nous-Hermes-Yi-34B |
| Agent-X | University of Oxford | OpenAI o4-mini |
| M-DocSum-Bench | BUPT | M-DocSum (7B) |
| SHIELD | UCLA | mPLUG-owl |
| RBench-V | Carnegie Mellon University | Human Experts Score |
| EmoBench | Wuhan University | EmoLLM |
| EgoMem | National University of Singapore | VideoLucy |
| H2VU | OPPO | Gemini-1.5-Pro |
| MME-CoF | Northeastern University | Sora-2 |
| GQA | Stanford University | Humans |
| MMT-Bench | Shanghai Artificial Intelligence Laboratory | GPT4o |
| GSM8K-V | Zhejiang University | Human |
| MMBench-Video | Shanghai AI Laboratory | GPT-4o-[1fps] |
| MIRB | Tongji University | GPT-4V |
| Scientists' First Exam | Shanghai Artificial Intelligence Laboratory | GPT-o3 |
| MMVM | Wuhan University | CoLVA-Qwen2VL-7B (Ours) |
| MIBench | Chinese Academy of Sciences | GPT-4o |
| Q-Bench-Video | Nanyang Technological University | GPT-4o |
| MMToM-QA | Harvard University | Human |
| VisNumBench | Tsinghua Shenzhen International Graduate School | Human |
| Video-MMLU | University of Washington | Claude-3.5-sonnet |
| IllusionVQA | UCLA | Human |
| MARBLE | ETH Zurich | GPT-o3 |
| VideoEval-Pro | University of Toronto | gemini-2.5-pro |
| HALLUCINOGEN | University of Maryland | Qwen2-VL |
| STAR | Shanghai Jiao Tong University | ClipBERT |
| PhysToolBench | Beihang University | HUMAN(BEST) |
| R-Bench | Carnegie Mellon University | OpenAI GPT-5 |
| VOILA | Arizona State University | Human |
| ChartBench | Tsinghua University | GPT-4O |
| MMCR | South China University of Technology | GPT-4o |
| ALLVB | National University of Defense Technology | Claude 3.5 Sonnet |
| Face Forgery Detection VQA Benchmark | University of Luxembourg | LlaVa-1.5 |
| VSP | UC Santa Barbara | GPT-4o |
| JudgeAnything | University of Illinois Chicago | Gemini-1.5-Pro |
| EgoIntention | Google DeepMind | MiniGPT-v2 |
| AVHBench | KAIST | AVHModel-Align-FT |
| FineCops-Ref | University of Electronic Science and Technology of China | CogVLM† |
| MathScape | Chinese Academy of Sciences | Human |
| POPVQA | Tel Aviv University | LLaVA34B |
| Needle In A Multimodal Haystack | Shanghai AI Laboratory | Human |
| GRASP | University of Amsterdam | Human Subjects |
| ViTextVQA | Vietnam National University | ViTextBLIP-2 (base) |
| ICQ | University of Oxford | TR-DETR |
| MultiChartQA | University of Notre Dame | Human |
| H-POPE | Saarland University | LLaVa |
| SITE | Boston University | Human |
| CLEVRER | Google DeepMind | NS-DR |
| CulturalVQA | Google DeepMind | GPT-4 |
| RTV-Bench | HKUST(GZ) | GPT-4o |
| SEED-Bench-2 | Tencent AI Lab | SEED-LLaMA |
| MDK12-Bench | Wuhan University | Gemini2-thinking |
| M4U | Chinese Academy of Sciences | GPT-4o |
| OCR-Reasoning | South China University of Technology | DouBao-1.5-Vision-Pro |
| InfiniBench | Monash University | GPT-4o |
| OST-Bench | Shanghai AI Laboratory | Human-Level |
| SAV-Caption | Google DeepMind | VoCap |
| M2KR | University of Cambridge | RA-VQAv2 w/ PreFLMR |
| VAGUE | UC Berkeley | Human |
| Video-Bench | Peking University | Video-ChatGPT |
| CMMMU | University of Waterloo | GPT-4o(202405130) |
| SEED-Bench-2-Plus | Tencent AI Lab | GPT-4V |
| MSQA | Peking University | LEO (FT) |
| SpatialViz-Bench | Chinese Academy of Sciences | Gemini-2.5-pro |
| SciVideoBench | Stanford University | Gemini-2.5-Pro |
| CONFLICTVIS | The Chinese University of Hong Kong | GPT-4o |
| HLV-1K | Nanyang Technological University | LLaVA-Video (72B) |
| Uni-MMMU | Shanghai Artificial Intelligence Laboratory | GPT4.1 + GPT-image |
| MuCR | Google DeepMind | Human |
| SIV-Bench | Tsinghua University | Gemini-2.5-Pro |
| MMReason | University of Science and Technology of China | GPT-4o-1120 |
| ArtifactsBench | Tencent | GPT-5 |
| TDBench | Columbia University | Gemini 2.5 Pro |
| MAVERIX | Carnegie Mellon University | Claude 3.5 Sonnet |
| MMR-V | Chinese Academy of Sciences | Human |
| MMSearch-Plus | The University of Hong Kong | o3 |
| GeoQA | Sun Yat-Sen University | Human (Text-Diagram) |
| FigureQA | Université de Montréal | Relation Network (RN) |
| VS-Bench | University of Science and Technology of China | o3 |
| Multimodal Inconsistency Reasoning | University of California, Santa Cruz | o1 (1217) |
| SciVerse | The Chinese University of Hong Kong | GPT-4o |
| VIP | University of California, Santa Barbara | VICUNA-13B |
| Vision LLM Safety Benchmark | University of Oxford | GPT-4V |
| FCMR | Hanyang University | Claude 3.5 Sonnet |
| MMWorld | Microsoft | GPT-4o |
| HumanVBench | Alibaba Group | Human |
| VideoReasonBench | Nanjing University | Human |
| Visual Commonsense Reasoning | University of Washington | Human Performance |
| CLEVR | Stanford University | Human |
| ViC-Bench | University of Science and Technology of China | o3 |
| OK-VQA | University of Washington | Prophet |
| SPORTU | University of California, Santa Barbara | Qwen2-VL-72B |
| MVTamperBench | University of Washington | VILA1.5-40b |
| VideoMathQA | Google Research | GPT-o4-mini |
| Spatial-MM | Monash University | GPT-4o |
| WikiMixQA | Google DeepMind | Human Experts |
| BLINK-Twice | Shanghai Artificial Intelligence Laboratory | Gemini-2.5-pro ✩ |
| UNPIE | Yonsei University | GPT-4V |
| CompreCap | Ant Group | Human |
| TIR-Bench | University of Southern California | o3-TU |
| DrVD-Bench | Tsinghua University | Gemini 2.5 Pro |
| MMLU-Reason | Lehigh University | Gemini-2.5 Pro |
| VisualToolBench | University of Illinois at Urbana-Champaign | GPT-5-think |
| ActivityNet-QA | Zhejiang University | E-SA |
| ROVER | University of Southern California | Nano Banana |
| CrossWordBench | University of Washington | DeepSeek-R1 |
| VidComposition | Arizona State University | Gemini-2.5-flash-preview |
| HumaniBench | Vector Institute | Qwen2.5-7B |
| LLMGeo | Vanderbilt University | Gemini |
| EgoExoBench | Shanghai AI Laboratory | Gemini 2.5 Pro |
| II-Bench | University of Waterloo | Qwen-VL-MAX |
| DIVE | Northeastern University | GRT |
| A-Bench | Shanghai Jiao Tong University | HUMAN (BEST) |
| IconQA | UCLA | Human |
| 11Plus-Bench | Chinese Academy of Sciences | GPT-o3 |
| AVA-Bench | The Ohio State University | SigLIP-2 |
| MaRs-VQA | University of Illinois at Urbana-Champaign | Human |
| UrBench | Wuhan University | Human |
| MUCAR | Tsinghua University | Human |
| Causal-VidQA | Shanghai Jiao Tong University | Human |
| Reasoning-OCR | Wuhan University | GPT-4o-20240806 |
| MMEvalPro | Alibaba Group | Human (Graduate Student) |
| OmniVideoBench | Nanjing University | Gemini-2.5-Pro |
| PSG | Nanyang Technological University | PSGTR (60 epochs) |
| VISFACTOR | The Chinese University of Hong Kong | Claude 3.7 Sonnet |
| FaceXBench | Johns Hopkins University | Qwen2-VL-72b-Instruct |
| REVERIE | Georgia Institute of Technology | Human |
| Argus Inspection | Shanghai Artificial Intelligence Laboratory | GPT-4.1-2025-04-14 |
| VFaith-Bench | Chinese Academy of Sciences | Gemini-2.5 |
| MMComposition | Microsoft | InternVL2-40B |
| UNO-Bench | Meituan | Gemini-2.5-Pro |
| LIME | Chinese Academy of Sciences | InternVL-2 (40B) |
| FlowVQA | Google Research | GPT-4V |
| Social Genome | Carnegie Mellon University | Gemini-1.5-Flash |
| MIHBench | Xiamen University | LLaVA-NeXT-Interleave + OURS (DAB) |
| MVU-Eval | Nanjing University | Gemini 2.5 Pro |
| HumanPCR | Chinese Academy of Sciences | o3 |
| SeriesBench | Beihang University | GPT-4o + PC-DCoT |
| LVLM-eHub | Shanghai AI Laboratory | InstructBLIP |
| LogicOCR | Wuhan University | Gemini-2.5-Pro |
| LVLM-Playground | Zhejiang University | GPT-4o |
| SMIR-BENCH | California Institute of Technology | Claude-3-Opus-20240229 |
| ChartMind | Chinese Academy of Sciences | GPT-4o |
| CURE | University of Illinois at Urbana-Champaign | Human |
| Ego-QA and MAD-QA | National University of Singapore | Human |
| Q-Bench+ | Shanghai Jiao Tong University | BlueImage-GPT |
| Visual7W | Stanford University | Human (Question + Image) |
| EXAMS-V | MBZUAI | GPT-4 (w/ OCR, captions) |
| VNBench | Chinese Academy of Sciences | Gemini 1.5 Pro |
| MCUB | Alibaba Group | DAMC |
| BlackSwanSuite | University of British Columbia | Human |
| PRISM-Bench | Apple | SkyWork R1V3-38B |
| TIMEBench | Harbin Institute of Technology | InternVL 2.5 + TIME |
| TempVS | Utrecht University | InternVL2.5 78B-MPO |
| MixEval-X | Google DeepMind | Claude 3.5 Sonnet |
| OmniEval | Huawei Noah’s Ark Lab | gemini-2.5-pro-preview-05-06 |
| Multi-image Relational Association | University of Waterloo | GPT4o |
| EgoGazeVQA | Beihang University | Qwen2.5-VL-72B |
| MIRAGE | Tsinghua University | QwenVL-2.5-72B |
| HumanSense | Ant Group | InternVL3-8B |
| HEMM | Carnegie Mellon University | GEMINI |
| AVUT | University of Cambridge | Gemini 1.5 Pro |
| MAC | Fudan University | Step-3 |
| V2P-Bench | University of Science and Technology of China | Human Performance |
| ReMI | Google DeepMind | Human |
| CompBench | The Ohio State University | GPT-4o |
| VisualTrans | Chinese Academy of Sciences | o3 |
| CMMU | Beijing Normal University | GPT-4V |
| MATE | University of Edinburgh | Human |
| GDI-Bench | Shanghai Artificial Intelligence Laboratory | GDI-Model |
| EEmo-Bench | Cardiff University | GPT-4o |
| LogicBench | Sun Yat-Sen University | Human |
| POLYMATH | Google DeepMind | Claude-3.5 Sonnet |
| CVBench | Sun Yat-Sen University | GPT-4o |
| VisioMath | Beijing Normal University | Gemini 2.5 Pro |
| PhysicsArena | The Hong Kong University of Science and Technology (Guangzhou) | Qwen-VL-Max |
| VideoAutoArena | National University of Singapore | GPT-4o |
| MMDocBench | National University of Singapore | GPT-4o |
| HIS-Bench | Chinese Academy of Sciences | HIS-GPT |
| oVQA | University of Freiburg | BLIP-2 OPT |
| VideoGLUE | Cornell University | CoCa |
| MMPerspective | Carnegie Mellon University | Gemini-2-flash (CoT) |
| Event-Bench | Chinese Academy of Sciences | GPT-4o |
| MATHLENS | Yonsei University | Gemini-2.5-Flash (Thinking) |
| HVSBench | City University of Hong Kong | Human |
| Mementos | University of Maryland, College Park | GPT-4V (Sequential) |
| BabelBench | ByteDance | ChatGPT 4 |
| MMRel | Nanyang Technological University | GPT-4o |
| TimeLogic | University of Central Florida | SeViLA |
| P²GB | Peking University | GPT-4V |
| PCA-Bench | Alibaba Group | GPT4-Vision-1106 |
| Corrupted Visual Genome | Carnegie Mellon University | HiKER-SGG |
| NeMoBench | Stanford University | Human Experts |
| CameraBench | Academia Sinica | Gemini 2.0 Flash |
| AlignMMBench | Tsinghua University | Qwen2-VL |
| Earth Observation VLM Benchmark | MIT | GPT-4V |
| IRR | Hokkaido University | LLaVA-NeXT (Mistral-7B) |
| SIRI-Bench | Sun Yat-Sen University | Doubao-1.5-pro (Textual Rep.) |
| ASCIIEval | Carnegie Mellon University | GPT-5 |
| Video Reasoning Evaluation Suite | ETH Zurich | Human |
| Face-Human-Bench | Beijing Normal University | InternVL-Chat-v1.2-Plus |
| Human-MME | National University of Singapore | GLM-4.5V |
| CogBench | Shanghai Jiao Tong University | Oracle |
| ReadBench | University of Trier | Qwen2.5-VL 7B |
| WebQuest | Google DeepMind | GPT-4V |
| JourneyBench | UCLA | GPT-4o |
| VQA-CP | Allen Institute for Artificial Intelligence | GVQA |
| VidText | IIE, CAS | Human |
| HumanVideo-MME | Tencent YouTu Lab | Qwen2.5-VL 32B |
| AbilityLens | Monash University | Qwen2.5VL-72b |
| NTSEBENCH | University of Utah | OpenAI o1-preview |
| MUSIC-AVQA | Renmin University of China | Our method (Spatio-Temporal Grounding) |
| SNS-Bench-VL | University of Oxford | Gemini-2.5-pro-exp-03-25 |
| NPHardEval4V | University of Michigan | GPT-4V |
| Clue-Visual QA | Shanghai University | Qvq-72B-preview |
| CODIS | Chinese Academy of Sciences | Human |
| BEAF | Yonsei University | Shikra (7B) |
| Dr.V-Bench | NUS | Human |
| TEMROBBENCH | National University of Singapore | mPLUG-Owl3 |
| Compositional Visual Relations | CNRS | ResNet-50 |
| MESH | Huawei Noah’s Ark Lab | InternVL2.5-78B |
| MMTABQA | UCLA | GPT-4o |
| COVER | Westlake University | InternVL2.5-78B |
| SportR | University of California, Santa Barbara | Qwen-VL-7B (SFT+RL) |
| ARB | Aalto University | GPT-4o |
| TUNA | Northeastern University | GPT-4o (0806) |
| VL-CheckList | Zhejiang University | ViLT |
| VideoRewardBench | University of Science and Technology of China | Gemini-2.5-Pro (2025-06) |
| Illusory VQA | IUST | Human |
| MT-Video-Bench | Fudan University | Gemini 2.5 Pro |
| GlitchBench | University of Alberta | GPT-4V |
| CAMEL-Bench | Aalto University | GPT-4o |
| ImplicitQA | University of Central Florida | Human Baseline |
| M3STR | Tianjin University | Qwen2.5-VL 72B-Instruct |
| ViLBias | Vector Institute | RoBERTa + CLIP (FT) |
| TEMPO | UC Berkeley | MLLC (Ours) Context Sup. Test |
| MIR | Beijing University of Posts and Telecommunications | Qwen2-VL |
| Video-OCR Bench | Huazhong University of Science and Technology | Qwen2-VL-7B |
| AVEB | University of Cambridge | FAVOR 13B (audio-visual) |
| GePBench | Nanjing University | Human |
| REBUS | MATS | GPT-4o |
| UAL-Bench | Texas A&M University | VideoLLaMA |
| MangaUB | the University of Tokyo | GPT-4o |
| ING-VP | MBZUAI | Claude-3.5 Sonnet |
| VideoVista-CulturalLingo | Harbin Institute of Technology | Gemini-2.0-Flash |
| CSVQA | Skywork AI | o1-preview |
| AVTrustBench | University of Toronto | GPT-4o |
| YouCookII-TVS | University of Illinois at Urbana-Champaign | GPT-4-turbo |
| Med-MIM | The Chinese University of Hong Kong | Med-Mantis |
| EgoCVR | University of Tübingen | TFR-CVR (Ours) |
| VidSitu | University of Southern California | Human |
| OV-VG | Beihang University | Ours (Swin-T, O365+, RefC finetune) |
| AURA | University of Maryland, College Park | Ola |
| VALUE | Tsinghua University | craig.starr (ensemble) |
| ODI-Bench | Tianjin University | o3 |
| MM-SpuBench | University of Illinois at Urbana-Champaign | GPT-4V |
| NoTeS-Bank | UNC-Chapel Hill | Human Baseline |
| FinMMR | Beijing University of Posts and Telecommunications | Claude 3.7 Sonnet (64K) |
| PACS | Carnegie Mellon University | Human |
| MultiVENT-G | Johns Hopkins University | GPT-4o |
| MFC-Bench | Beijing University of Posts and Telecommunications | Human |
| MMR | University at Buffalo | Claude 3.5 Sonnet |
| MAIA | University of Pisa | Qwen2.5-VL-72B |
| MOMENTS | University of Michigan | LLaVA-Video-72B |
| VS-TDX | KAIST | GPT-4o |
| MERLIM | King Abdullah University of Science and Technology (KAUST) | MiniGPT-4 (Vicuna-7B v0) |
| XD-Violence | The Chinese University of Hong Kong | VA-GPT (Vicuna-7B) |
| VGSI | University of Pennsylvania | Human |
| MTMEUR | Hefei University of Technology | Our Method |
| MoHoBench | Fudan University | Llama-3.2-90B-Vision-Instruct |
| Multi-Dimensional Insights | Huazhong University of Science and Technology | GPT-4o |
| MAPWise | University of Utah | Human |
| VisScience | Beihang University | Claude3.5-Sonnet |
| BenchLMM | Northeastern University | GPT-4V |
| Sherlock | University of Washington | CLIP (RN50x64) + multitask clue learning |
| OBI-Bench | Shanghai Jiao Tong University | GPT-4O (ver. 0806) |
| VER-Bench | Chinese Academy of Sciences | Gemini-2.5 Pro Preview |
| IQBench | University of Alabama at Birmingham | o4-mini |
| TennisTV | New York University | GPT-4.1 |
| Charting New Territories | University of Cambridge | GPT-4V |
| O-Bench | Peking University | Human |
| KnowDR-REC | LMU Munich | Qwen-VL-Max |
| Cops-Ref | Tencent AI Lab | MattNet-Mine |
| CausalVLBench | University of Arkansas | Gemini-2.0-Flash |
| BloomVQA | Boston University | GPT-4V |
| EgoSDQES | Stanford University | LaViLa + QR-Adapter |
| MultiStAR | The University of Melbourne | Qwen2-VL-72B* |
| ViLMA | University of Amsterdam | BLIP-2 |
| VALSE | University of Amsterdam | ViLBERT 12-in-1 |
| MR²-Bench | University of Science and Technology of China | Seed-1.6-Embedding |
| VQA-Rephrasings | Georgia Institute of Technology | BAN + CC |
| VISE | Kyungpook National University | InternVL 2.5 26B |
| MM-BigBench | Northeastern University | InstructBLIP |
| QL-Bench | Shanghai Jiao Tong University | InternLM-XComposer2d5 (7B) |
| MM-InstructEval | Northeastern University | GPT-4V |
| MetaCLUE | FT CLIP (ViT-L/14) | |
| MMRobustness | University of Illinois at Urbana-Champaign | CLIP ZS |
| MathOPEval | Alibaba Group | Human |
| TouchStone | Alibaba Group | GPT-4V |
| MET-Bench | The University of Texas at Austin | GPT-4o |
| DesignProbe | Harbin Institute of Technology | GPT-4V |
| IQUAD V1 | University of Washington | Human |
| iWISDM | Université de Montréal | Human |
| MSRVTT-P | University of Central Florida | FIT (zs) |
| FRAMES-VQA | Georgia Institute of Technology | SPD LoRA |
| AccidentBench | UCL | GPT 5 |
| Moments-OVRE | Fudan University | Ours (CLIP-ViT + GPT-2) |
| Contra4 | University of Pennsylvania | CREMA |
| MaRVL-QA | o4-mini | |
| CRIC | Chinese Academy of Sciences | ViLBERT+l_att |
| Bias in the Picture | Vector Institute for AI | Qwen2.5-VL |
| MOSABench | Hefei University of Technology | mPLUG-owl-7B |
| Multi-Physics | The Chinese University of Hong Kong, Shenzhen | Gemini-2.5-Pro |
| AstroChart | Zhejiang Lab | Gemini-2.5-Pro |
| UI2V-Bench | Huawei Noah’s Ark Lab | Hailuo |
| JRDB-Reasoning | Monash University | InternVL 2.5 |
| EasyARC | ETH Zurich | Claude 3.7 Sonnet |
| UWBench | Northwestern Polytechnical University | GPT-5 |
| IMAGECODE | Samsung | Human Performance |
| SGG Benchmark | Microsoft Cloud AI | RelDN |
| ReForm-Eval | Northeastern University | BLIP-2_F |
| PARROT-360V | Redblock AI | GPT-4o |
| EGOILLUSION | University of Maryland, College Park | Human Evaluation |
| MANBench | Huazhong University of Science and Technology | Human (Best) |
| xGQA | University of Cambridge | mBERT_Ada |
| SEAM | University of Toronto | GPT-5-mini |
| ChEF | Shanghai Artificial Intelligence Laboratory | GPT-4V |
| LAVA | Beijing Institute of Technology | Lava |
| Trust-videoLLMs | Tsinghua University | Claude 3.7 Sonnet |
| MME-CC | Nanjing University | Human |
| GOBench | Shanghai AI Laboratory | Gemini-2.5Pro |
| GUESSBENCH | Shanghai Jiao Tong University | Qwen2.5-VL-72B |
| CLEVR Mental Rotation Tests | McGill University | Upper bound (canonical views only) |
| ConViS-Bench | University of Trento | LLaVA-OV-7B |
| Common-O Bench | Meta | Llama 4 Instruct Scout |
| Compass Direction Reasoning | Beijing Institute of Technology | Gemini 1.5 Pro |
| CLEVR-Ref+ | Northwestern Polytechnical University | IEP-Ref (700K prog.) |
| NEMO | the University of Tokyo | Human |
| ReasonBench | Beijing Electronic Science & Technology Institute | Human Baseline |
| MeViS-X | Deakin University | Planner-Refiner |
| IGLUE | University of Cambridge | UC₂ |
| Spacewalk-18 | Brown University | Caption-enhanced LLM |
| VisualQuest | Dalian University of Technology | Gemini-2.0-Flash-exp |
| MEBench | Georgia Institute of Technology | CogVLM |
| ST-VQA | Computer Vision Center | VTA |
| VLUE | HKUST | METER |
| CompareBench | OPPO | Gemini 2.5 Pro |
| gCOG | IBM Research | SSTfmr |
| MedVidCQA | Chinese Academy of Sciences | CCGS (Our Method) |
| Seeing Culture Benchmark | Singapore Management University | GPT-o3 |
| LongInsightBench | Peking University | Gemini2.5-Flash |
| Compositional Temporal Grounding | Zhejiang University | VISA |
| VLQA | Arizona State University | Human |
| SE-CE | Texas A&M University | GPT-4.1 Mini |
| ComBo | Chinese Academy of Sciences | ViT-B/16 (fine-tuned) |
| MultipanelVQA | University of California, Santa Cruz | Human |
| How2R | Microsoft Dynamics 365 AI Research | HERO |
| MMA-ASIA | Shanghai University of Finance and Economics | GPT-4o |
| PARC | University of Stuttgart | InternVL2 40B |
| WildQA | University of Michigan - Ann Arbor | Human |
| R²-Bench | Carnegie Mellon University | SEEM |
| MCTBench | ByteDance | GPT-4V |
| VQA-GEN | Arizona State University | ViLT |
| VQA-LOL | Arizona State University | LOL (full) |
| Probe | University of Central Florida | BridgeTower |
| SHOP-VRB | Imperial College London | XNM GT/GT |
| HumanCog | UCLA | HumanCog (Ours large) |
| VIVA+ | The Hong Kong Polytechnic University | GPT-4.1 |
| M-EV² | Beihang University | MEEL |
| G-VUE | UCLA | ViT-16-CLIP |
| Visual Genome | Auburn University | EM-Grounding (None) |
| VLM@school | Hof University of Applied Sciences | QwenVL2.5 32B |
| Fine-Grained Image Analysis Benchmark | University of Cambridge | claude-3-5-sonnet-20241022 |
| INTERCHART | University of Pennsylvania | Gemini-1.5-Pro |
| RPTS-Eval | Harbin Institute of Technology | GPT-4o |
| CFVBench | Chinese Academy of Sciences | gemini-2.5-flash |
| V-HUB | Shanghai Jiao Tong University | Qwen-2.5-VL-72B |
| AeroEye-v1.0 | Ohio State University | THYME |
| SPOT Prober | University of Oxford | InternVideo |
| VL-GLUE | Arizona State University | ViLT (Fine-tuned) |
| SPOT | University of Oxford | ALPRO |
| ImageNetVC | Shanghai AI Lab | LLaMA-65B |
| VQArt-Bench | University of Zurich | Gemini 2.5 |
| SNARE | South China University of Technology | BLIP |
| SIMMC | Facebook Assistant | SimpleTOD+MM |
| Res-Bench | University of Science and Technology of China | mPLUG-Owl3 |
| EmoBench-Reddit | Tianjin University | gemini-2.5-pro |
| PerceptualQA | Beijing Normal University | Human Baseline |
| CMR-SPB | Sony AI | Gemini 2.0 Flash |
| IndicVisionBench | Krutrim AI | Gemini-2.5 Flash |
| VLURes | NAIST | Human Performance |
| ViMoNet-Bench | American International University Bangladesh | ViMoNet |
| ISO-Bench | Stony Brook University | Human* |
| BLEnD-Vis | Sogang University | GPT-4o |
| Drill-down | University of Virginia | Drill-down_3x256 |
| MMAO-Bench | Meituan | Gemini-2.5-Pro |
| TrUMAn | National Taiwan University | Oracle (BERT) |
| PISA-Bench | DFKI | GPT-4o |
| HVQR | Sun Yat-Sen University | KM-net |
| YouCookII | the University of Tokyo | DORi |
| YouMakeup VQA Challenge | Renmin University of China | SCDM+ (I3D features) |