Maitrix
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation
27 Jun 2025

A new atomic evaluation framework, WM-ABench, systematically assesses Vision-Language Models' capabilities as internal world models, revealing significant limitations in their spatial, temporal, and dynamic scene understanding, as well as their ability to perform causal, transitive, and compositional predictions, falling substantially short of human performance.

View blog
Resources
There are no more papers matching your filters at the moment.