alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

OpenGVLabShanghai AI Laboratory

A Survey of Reinforcement Learning for Large Reasoning Models

09 Oct 2025

University of Washington Shanghai AI Laboratory

This survey paper systematically synthesizes advancements in Reinforcement Learning (RL) for Large Reasoning Models (LRMs), moving beyond human alignment to focus on enhancing intrinsic reasoning capabilities through verifiable rewards. It identifies key components, challenges, and future directions for scaling RL towards Artificial SuperIntelligence (ASI).

#computer-science #artificial-intelligence #computation-and-language

Resources 1,595

Paper thumbnail

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

08 Nov 2025

University of Illinois at Urbana-Champaign

University of California, Santa Barbara

A comprehensive survey formally defines Agentic Reinforcement Learning (RL) for Large Language Models (LLMs) as a Partially Observable Markov Decision Process (POMDP), distinct from conventional LLM-RL, and provides a two-tiered taxonomy of capabilities and task domains. The work consolidates open-source resources and outlines critical open challenges for the field.

#agentic-frameworks #agents #computer-science

Paper thumbnail

FlowRL: Matching Reward Distributions for LLM Reasoning

04 Nov 2025

daixuancheng6

daixuan cheng

Shanghai AI Laboratory

Shanghai Jiao Tong University

FlowRL presents a policy optimization algorithm for large language models that leverages GFlowNet principles to match reward distributions rather than merely maximizing expected reward. This approach yielded superior performance on math and code reasoning benchmarks and notably increased the diversity of generated solutions.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

27 Aug 2025

haomin-wang

Haomin Wang

Shanghai AI Laboratory

InternVL3.5, developed by the InternVL Team at Shanghai AI Laboratory, introduces a new family of open-source multimodal models that achieves state-of-the-art results across 35 benchmarks, narrowing the performance gap with commercial systems to 3.9%, while delivering a 4.05x inference speedup through novel architectural and training strategies.

#agents #computer-science #computer-vision-and-pattern-recognition

Resources 9,272

Paper thumbnail

Flow-GRPO: Training Flow Matching Models via Online RL

27 Oct 2025

Shanghai AI Laboratory Nanjing University logo

Nanjing University

Researchers from MMLab, Tsinghua University, Kuaishou Technology, and Shanghai AI Lab developed Flow-GRPO, a framework that integrates online policy gradient reinforcement learning into flow matching models. This method significantly enhances capabilities in compositional image generation, visual text rendering, and human preference alignment, achieving up to 95% GenEval accuracy on SD3.5-M by addressing challenges of determinism and sampling efficiency through an ODE-to-SDE conversion and denoising reduction.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

05 Dec 2024

Shanghai AI Laboratory Xiamen University

FlashSloth, developed by researchers from Xiamen University, Tencent Youtu Lab, and Shanghai AI Laboratory, introduces a Multimodal Large Language Model (MLLM) architecture that significantly improves efficiency through embedded visual compression. The approach reduces visual tokens by 80-89% and achieves 2-5 times faster response times, while maintaining highly competitive performance across various vision-language benchmarks.

#computer-science #computer-vision-and-pattern-recognition #efficient-transformers

Paper thumbnail

Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials

23 May 2024

Shanghai AI Laboratory

Shanghai Jiao Tong University

Researchers from Fudan University and Shanghai AI Laboratory introduce "Make-it-Real," a framework that leverages GPT-4V to automatically paint 3D objects with realistic materials from albedo-only inputs. It generates a full suite of SVBRDF maps, achieving up to 77.8% human user preference and 84.8% GPT evaluation preference for refined objects over unrefined ones, significantly enhancing visual authenticity.

#computer-science #computer-vision-security #artificial-intelligence

Paper thumbnail

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

28 May 2025

University of Illinois at Urbana-Champaign CUHK logo

This paper identifies and characterizes a universal policy entropy collapse in reinforcement learning for large language models (LLMs), revealing an empirical law that links performance to entropy. It further provides a mechanistic understanding of this phenomenon through covariance analysis and proposes two covariance-aware regularization methods, Clip-Cov and KL-Cov, which successfully maintain higher entropy and improve LLM reasoning performance on math and coding tasks.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Towards a Unified View of Large Language Model Post-Training

04 Sep 2025

Shanghai AI Laboratory Tsinghua University logo

Tsinghua University

Researchers from Tsinghua University and Shanghai AI Lab introduce a unified theoretical framework that views various Large Language Model (LLM) post-training algorithms as a single optimization process. Based on this framework, they propose Hybrid Post-Training (HPT), a dynamic algorithm that adaptively switches between Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) signals, achieving state-of-the-art performance on mathematical reasoning benchmarks.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Learning to Reason under Off-Policy Guidance

22 Jun 2025

yafu817

Yafu Li

jianhao-yan

Jianhao Yan

Shanghai AI Laboratory Nanjing University logo

Nanjing University

LUFFY introduces a framework that enhances Large Reasoning Models (LRMs) by integrating off-policy guidance into Reinforcement Learning with Verifiable Rewards (RLVR). This approach enables LRMs to acquire new reasoning capabilities from stronger external policies, achieving state-of-the-art performance on math benchmarks, superior generalization on out-of-distribution tasks, and successfully training weaker foundation models where on-policy methods fail.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

09 Sep 2025

Shanghai AI Laboratory Harbin Institute of Technology (Shenzhen)

F₁, a Vision-Language-Action (VLA) model, integrates explicit visual foresight into its decision-making process, moving beyond purely reactive control. This approach yields enhanced robustness in dynamic environments and improved generalization across a range of real-world and simulated robotic manipulation tasks.

#computer-science #computer-vision-and-pattern-recognition #robotics

Paper thumbnail

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

19 Apr 2025

xuehui-wang

Huiser Wang

jie-shao

jie shao

haomin-wang

Haomin Wang

Shanghai AI Laboratory Fudan University logo

Fudan University

InternVL3 establishes a new native multimodal pre-training paradigm for MLLMs, allowing the model to jointly acquire visual and linguistic capabilities from the outset. This approach achieves state-of-the-art performance among open-source models, reaching 72.2 on the MMMU benchmark, and demonstrates strong competitiveness with leading proprietary models across a wide range of multimodal tasks.

#computer-science #computer-vision-and-pattern-recognition #few-shot-learning

Resources 7,602

Paper thumbnail

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

07 Oct 2025

Shanghai AI Laboratory

Shanghai Jiao Tong University

Lumina-DiMOO is an omni diffusion large language model employing fully discrete diffusion for unified multi-modal generation and understanding, achieving state-of-the-art performance across text-to-image, image-to-image, and visual understanding benchmarks, while also demonstrating a 32x speed improvement in T2I generation compared to leading autoregressive models.

#computer-science #computer-vision-and-pattern-recognition

Paper thumbnail

SIM-CoT: Supervised Implicit Chain-of-Thought

25 Sep 2025

Shanghai AI Laboratory Fudan University logo

Fudan University

SIM-CoT stabilizes and enhances implicit Chain-of-Thought reasoning in large language models by integrating fine-grained, step-level supervision for latent tokens during training. It addresses latent instability, achieves higher accuracy than explicit CoT in some settings while preserving inference efficiency, and offers unprecedented interpretability into the model's internal thought processes.

#chain-of-thought #computer-science #artificial-intelligence

Paper thumbnail

f(x)

g(x)

f(g(x))

: LLMs Learn New Skills in RL by Composing Old Ones

13 Oct 2025

University of Illinois at Urbana-Champaign Shanghai AI Laboratory

Researchers at University of Illinois Urbana-Champaign, Tsinghua University, Peking University, and Shanghai AI Laboratory provide evidence that large language models (LLMs) can acquire new, generalizable compositional skills through reinforcement learning (RL) post-training. Their controlled synthetic experiments show that RL enables LLMs to compose atomic skills for complex tasks, demonstrating significant generalization to unseen difficulties and cross-task transfer, a capability not achieved by supervised fine-tuning.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

19 May 2025

yan-ding

Yan Ding

Shanghai AI Laboratory

Shanghai AI Laboratory's SpatialVLA introduces novel spatial representations for Vision-Language-Action models, equipping them with profound 3D spatial understanding via Ego3D Position Encoding for observations and unifying actions via Adaptive Action Grids. The model achieves state-of-the-art zero-shot performance on diverse real-world robot tasks and demonstrates efficient adaptation to new robot setups, surpassing larger models like RT-2-X with a smaller parameter count.

#computer-science #artificial-intelligence #robotics

Paper thumbnail

Planning-oriented Autonomous Driving

23 Mar 2023

chonghao-sima

Chonghao Sima

lin-tianwei

Lin Tianwei

keyu-li

Keyu Li

Wuhan University Shanghai AI Laboratory

This paper introduces UniAD, a comprehensive, planning-oriented end-to-end framework for autonomous driving that integrates perception, prediction, and planning tasks into a single neural network. It achieves state-of-the-art performance across all integrated tasks on the nuScenes benchmark, demonstrating improved accuracy in motion forecasting and planning safety over previous methods.

#autonomous-vehicles #computer-science #computer-vision-and-pattern-recognition

Resources 3,769

Paper thumbnail

Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning

16 Sep 2025

Shanghai AI Laboratory East China Normal University

The PeCL framework enables privacy-enhanced continual learning for Large Language Models by introducing a novel token-level dynamic differential privacy mechanism and a privacy-guided memory sculpting module. This approach allows models to selectively forget sensitive information while efficiently retaining crucial task-invariant knowledge, outperforming existing baselines in balancing utility and privacy.

#computer-science #continual-learning #artificial-intelligence

Paper thumbnail

EO-1: Interleaved Vision-Text-Action Pretraining for General Robot Control

15 Oct 2025

Northwestern Polytechnical University Shanghai AI Laboratory

The EO-Robotics Team developed EO-1, a 3B parameter embodied foundation model, employing a unified architecture and interleaved vision-text-action pretraining for general robot control. The model achieved state-of-the-art performance, surpassing GPT-4o and Gemini 1.5 Flash in overall embodied reasoning, and demonstrated an 86.0% completion rate across 28 diverse real-world manipulation tasks.

#agents #computer-science #artificial-intelligence

Paper thumbnail

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

26 Sep 2025

botian-shi

Botian Shi

xuehui-wang

Huiser Wang

kun-zhang

Kun Zhang

Shanghai AI Laboratory Fudan University logo

Fudan University

Shanghai AI Laboratory's InternVL 2.5 introduces an advanced series of open-source multimodal large language models (MLLMs) that achieve competitive performance against leading commercial models. The research systematically explores model, data, and test-time scaling strategies, resulting in the first open-source MLLM to surpass 70% on the challenging MMMU benchmark.

#computer-science #computer-vision-and-pattern-recognition #multi-modal-learning

Paper thumbnail

There are no more papers matching your filters at the moment.