AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
BibTex
Copy
@misc{bengio2025alignvlmbridgingvision,
title={AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding},
author={Yoshua Bengio and Chao Wang and Sai Rajeswar and Bang Liu and Christopher Pal and Tianyu Zhang and Suyuchen Wang and Spandana Gella and Perouz Taslakian and Nicolas Chapados and David Vazquez and Aarash Feizi and Enamul Hoque and Marco Pedersoli and Issam H. Laradji and Abhay Puri and Sathwik Tejaswi Madhusudhan and Ahmed Masry and Juan A. Rodriguez and Xiangru Jian and Pierre-André Noël and Akshay Kalkunte Suresh},
year={2025},
eprint={2502.01341},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.01341},
}
Transform this paper into an audio lecture
Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.