alphaXiv

History

Papers Benchmarks

AppCubic USA

755

28 Nov 2025

computer-science artificial-intelligence generative-models

A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

Georgia Institute of Technology

University of Wisconsin-Madison

Purdue University University of Liverpool JTB Technology Corp.Stockton University Nomad Sustaintech LTD AppCubic USA

A comprehensive survey synthesizes the current landscape of Multimodal Large Language Models (MLLMs), detailing their architectures, training methodologies, and diverse applications across vision-language tasks. The work also critically outlines persistent technical challenges and crucial ethical considerations for their responsible development and integration into society.

There are no more papers matching your filters at the moment.

Events

AI for Law
Joel Niklaus· Hugging Face
01/09
Register
Watch recordings

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

Events

AI for Law

Personalize Your Feed