From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
BibTex
Copy
@misc{zhouWed Oct 08 2025 15:20:30 GMT+0000 (Coordinated Universal Time)perceptioncognitionsurvey,
title={From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models},
author={Chenyue Zhou and Mingxuan Wang and Yanbiao Ma and Chenxu Wu and Wanyi Chen and Zhe Qian and Xinyu Liu and Yiwei Zhang and Junhao Wang and Hengbo Xu and Fei Luo and Xiaohua Chen and Xiaoshuai Hao and Hehan Li and Andi Zhang and Wenxuan Wang and Lingling Li and Zhiwu Lu and Yang Lu and Yike Guo},
year={Wed Oct 08 2025 15:20:30 GMT+0000 (Coordinated Universal Time)},
eprint={2509.25373},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.25373},
}
Transform this paper into an audio lecture
Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.