From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models

BibTex

Copy

@misc{zhouWed Oct 08 2025 15:20:30 GMT+0000 (Coordinated Universal Time)perceptioncognitionsurvey,
      title={From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models},
      author={Chenyue Zhou and Mingxuan Wang and Yanbiao Ma and Chenxu Wu and Wanyi Chen and Zhe Qian and Xinyu Liu and Yiwei Zhang and Junhao Wang and Hengbo Xu and Fei Luo and Xiaohua Chen and Xiaoshuai Hao and Hehan Li and Andi Zhang and Wenxuan Wang and Lingling Li and Zhiwu Lu and Yang Lu and Yike Guo},
      year={Wed Oct 08 2025 15:20:30 GMT+0000 (Coordinated Universal Time)},
      eprint={2509.25373},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.25373},
}

Transform this paper into an audio lecture

Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.

Audio lecture

Q&A format

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models