alphaXiv

History

Papers Benchmarks

Institute of Digital Twin

1,379

10 Oct 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Multimodal Language Models See Better When They Look Shallower

National University of Singapore Meituan Inc.Zhejiang Gongshang University Ningbo Key Laboratory of Spatial Intelligence and Digital Derivative Institute of Digital Twin GenmoAI

Researchers systematically analyzed visual layer selection in Multimodal Large Language Models (MLLMs), demonstrating that integrating features from shallow, middle, and deep Vision Transformer layers via a simple concatenation fusion outperforms conventional deep-layer reliance and more complex fusion strategies.

751

04 Sep 2025

computer-science information-retrieval

MultiConIR: Towards multi-condition Information Retrieval

Shanghai Jiao Tong University

The Hong Kong Polytechnic University Meituan Inc.Ningbo Key Laboratory of Spatial Intelligence and Digital Derivative Institute of Digital Twin

The MULTICONIR benchmark was developed to systematically evaluate information retrieval and reranking models on multi-condition natural language queries, revealing that current state-of-the-art models suffer significant performance degradation and lack robust relevance monotonicity and format invariance. Advanced general-purpose LLMs, such as GPT-4o, demonstrated superior capabilities in these complex retrieval scenarios.

103

10 Apr 2025

computer-science artificial-intelligence sound

Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis

Tsinghua University

Peking University Beijing University of Posts and Telecommunications Institute of Digital Twin Logic Intelligence Technology

A data-efficient framework for Thai text-to-speech synthesis combines phoneme-tone adaptive modeling with specialized preprocessing pipelines to handle complex linguistic features, achieving high-fidelity speech synthesis and zero-shot voice cloning while requiring significantly less training data than traditional approaches.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Multimodal Language Models See Better When They Look Shallower

MultiConIR: Towards multi-condition Information Retrieval

Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis

Events

AI for Law

Personalize Your Feed