alphaXiv

This survey paper from researchers at Tongji University and Fudan University offers a comprehensive, systematic synthesis of Retrieval-Augmented Generation (RAG) for Large Language Models. It structures the field by delineating three evolutionary paradigms—Naive, Advanced, and Modular RAG—and details advancements across retrieval, generation, and augmentation components, while also providing a thorough framework for evaluating RAG systems.

548

1,308

16 Jun 2025

agents chain-of-thought computer-science

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

UCLA Tongji University NUS NTU A*STAR

Kun Wang

G-Memory introduces a hierarchical, graph-based memory system designed for Large Language Model-based Multi-Agent Systems, enabling them to learn from complex collaborative histories. The system consistently improves MAS performance, achieving gains up to 20.89% in embodied action and 10.12% in knowledge question-answering tasks, while maintaining resource efficiency.

92,247

23 Sep 2025

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Zhejiang University The University of Edinburgh Baichuan Inc.

The ReSearch framework enables Large Language Models to integrate multi-step reasoning with external search, learning interactively via reinforcement learning without supervised intermediate steps. It yields substantial performance gains on complex multi-hop question answering benchmarks and reveals emergent self-correction capabilities.

309

2,397

03 Dec 2025

computer-science continual-learning computation-and-language

MemOS: A Memory OS for AI System

University of Science and Technology of China

Beihang University

Renmin University of China

Zhejiang University

Peking University Institute for Advanced Algorithms Research, Shanghai Research Institute of China Telecom MemTensor (Shanghai) Technology Co., Ltd.

Jiahao Huo

MemOS, a memory operating system for AI systems, redefines memory as a first-class system resource to address current Large Language Model limitations in long-context reasoning, continuous personalization, and knowledge evolution. This framework unifies heterogeneous memory types (plaintext, activation, parameter) using a standardized MemCube unit, achieving superior performance on benchmarks like LoCoMo and PreFEval, and demonstrating robust, low-latency memory operations.

2,562

526

24 Oct 2025

adversarial-robustness agents computer-science

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

National University of Singapore Tongji University

Fudan University Shanghai Innovation Institute

Researchers introduced LIBERO-Plus, a diagnostic benchmark for vision-language-action (VLA) models, revealing that current models exhibit substantial fragility to environmental perturbations and frequently ignore linguistic instructions. Fine-tuning with a generalized dataset significantly enhances their robustness.

985

20 Oct 2025

agentic-frameworks agents ai-for-health

From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

Shanghai Artificial Intelligence Laboratory Tongji University

The Chinese University of Hong Kong

Tsinghua University

Zhejiang University

University of British Columbia

Stony Brook University

Shanghai Jiaotong University Lingang Laboratory

This survey establishes "Agentic Science" as a paradigm for autonomous scientific discovery, offering a unified framework that integrates agent capabilities, scientific workflows, and domain-specific applications across natural sciences. It charts the evolution of AI from computational tools to autonomous research partners, highlighting over 20 validated scientific discoveries made by AI agents.

4,391

09 Jun 2025

National University of Singapore Tongji University

Multi-agent Architecture Search via Agentic Supernet

Shanghai AI Laboratory

University of Science and Technology of China

Kun Wang

MaAS introduces an 'agentic supernet' to enable dynamic, query-dependent allocation of multi-agent system resources and architectures. The framework achieves superior performance across diverse tasks while significantly reducing inference costs and demonstrating strong adaptability compared to prior fixed-architecture methods.

1,132

06 Feb 2025

adversarial-robustness computer-science machine-learning

G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks

CUHK Shanghai AI Laboratory Tongji University

Emory University USTC

Kun Wang

G-Designer introduces a framework using Graph Neural Networks to dynamically generate task-aware communication topologies for LLM-based multi-agent systems. This approach achieved superior performance on various benchmarks while significantly reducing token consumption by up to 95.33% and demonstrating high adversarial robustness.

419

20 Sep 2025

computer-science computation-and-language fine-tuning

Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle

The Chinese University of Hong Kong Lancaster University The University of Toronto ByteDance SAIL Team

This survey offers a comprehensive review of how Reinforcement Learning (RL) is applied across the entire lifecycle of Large Language Models, from pre-training to alignment and reinforced reasoning. It particularly emphasizes the role of Reinforcement Learning with Verifiable Rewards (RLVR) in advancing LLM reasoning capabilities and compiles key datasets, benchmarks, and open-source tools for the field.

384

30 Nov 2025

computer-science computation-and-language computer-vision-and-pattern-recognition

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Fudan University Shanghai Innovation Institute

SRPO (Self-Referential Policy Optimization) enhances Vision-Language-Action (VLA) models for robotic manipulation by addressing reward sparsity, generating dense, progress-wise rewards using the model's own successful trajectories and latent world representations from V-JEPA 2. The method achieved a 99.2% success rate on the LIBERO benchmark, a 103% relative improvement over its one-shot SFT baseline, and demonstrated strong generalization on the LIBERO-Plus benchmark.

226

947

05 Oct 2025

chain-of-thought computer-science computation-and-language

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Imperial College London

University of Southern California Tongji University

Nanjing University

University of Michigan

ByteDance

University of Minnesota

The University of Hong Kong

Duke University Case Western Reserve University

The Ohio State University Kean University

SRPO enhances multimodal large language models by integrating explicit self-reflection and self-correction capabilities through a two-stage training framework. The approach achieves state-of-the-art performance among open-source models, scoring 78.5% on MathVista with SRPO-32B, and showing competitive results against leading closed-source models across diverse reasoning benchmarks.

1,398

27 May 2025

computer-science computer-vision-and-pattern-recognition image-generation

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Wuhan University Shanghai AI Laboratory Tongji University

computer-science computational-engineering-finance-and-science

Wenhao Chai

The RISEBench benchmark evaluates Large Multi-modality Models (LMMs) on Reasoning-Informed Visual Editing (RISE), assessing their capacity for visual modifications guided by complex instructions and logical inference across temporal, causal, spatial, and logical domains. Initial evaluations show leading proprietary models like GPT-4o-Image achieve only 28.9% overall accuracy, indicating substantial challenges, particularly in logical reasoning tasks.

333

22 Oct 2025

Chem-R: Learning to Reason as a Chemist

University of Science and Technology of China

Nanjing University Shanghai AI Lab The Chinese University of Hong Kong, Shenzhen

The University of Hong Kong Hong Kong Polytechnic University

David Wang

Although large language models (LLMs) have significant potential to advance chemical discovery, current LLMs lack core chemical knowledge, produce unreliable reasoning trajectories, and exhibit suboptimal performance across diverse chemical tasks. To address these challenges, we propose Chem-R, a generalizable Chemical Reasoning model designed to emulate the deliberative processes of chemists. Chem-R is trained through a three-phase framework that progressively builds advanced reasoning capabilities, including: 1) Chemical Foundation Training, which establishes core chemical knowledge. 2) Chemical Reasoning Protocol Distillation, incorporating structured, expert-like reasoning traces to guide systematic and reliable problem solving. 3) Multi-task Group Relative Policy Optimization that optimizes the model for balanced performance across diverse molecular- and reaction-level tasks. This structured pipeline enables Chem-R to achieve state-of-the-art performance on comprehensive benchmarks, surpassing leading large language models, including Gemini-2.5-Pro and DeepSeek-R1, by up to 32% on molecular tasks and 48% on reaction tasks. Meanwhile, Chem-R also consistently outperforms the existing chemical foundation models across both molecular and reaction level tasks. These results highlight Chem-R's robust generalization, interpretability, and potential as a foundation for next-generation AI-driven chemical discovery. The code and model are available at this https URL.

3,959

25 Sep 2025

agent-based-systems causal-inference computer-science

R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization

R&D-Agent-Quant (R&D-Agent(Q)) is a multi-agent framework that automates the entire quantitative financial research and development pipeline by jointly optimizing financial factors and predictive models. It achieves superior investment strategy performance with up to 2 times higher annualized returns and 70% fewer factors compared to state-of-the-art baselines, demonstrating robustness across diverse financial markets.

4,441

1,092

22 Sep 2025

chain-of-thought computer-science artificial-intelligence

Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities

Aalborg University Southeast University

A comprehensive survey proposes a novel two-dimensional taxonomy to organize research on integrating Large Language Models and Knowledge Graphs for Question Answering, detailing methods by complex QA categories and the specific functions KGs serve. The work reviews state-of-the-art techniques, highlights their utility in mitigating LLM weaknesses such as factual inaccuracy and limited reasoning, and identifies critical future research directions.

396

27 Aug 2025

Shanghai University of Finance and Economics Tongji University

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

Beihang University Tanka AI

NEMORI, a self-organizing agent memory system, was developed by researchers from Tongji University, Shanghai University of Finance and Economics, Beihang University, and Tanka AI, drawing inspiration from cognitive science to address the 'amnesia' of large language models. The system established new state-of-the-art performance on the LoCoMo dataset with an LLM score of 0.744 using gpt-4o-mini, while simultaneously reducing token usage by 88% compared to full context baselines.

107

259

27 Nov 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

Architecture Decoupling Is Not All You Need For Unified Multimodal Model

University of Science and Technology of China Meituan CUHK MMLab

This work uncovers that architectural decoupling in unified multimodal models (UMMs) improves performance by inducing task-specific attention patterns, rather than eliminating task conflicts. Researchers from CUHK MMLab and Meituan introduce an Attention Interaction Alignment (AIA) loss, a regularization technique that guides UMMs' attention toward optimal task-specific behaviors without architectural changes, enhancing both understanding and generation performance for models like Emu3 and Janus-Pro.

237

24 Sep 2025

agents computer-science artificial-intelligence

Do You Need Proprioceptive States in Visuomotor Policies?

New York University Tongji University

Tsinghua University Spirit AI

Robotic visuomotor policies achieve dramatically improved spatial generalization by removing proprioceptive state inputs. This approach leverages a relative end-effector action space and comprehensive egocentric vision to enable robust task performance across varied spatial configurations, while also enhancing data efficiency and cross-embodiment adaptation.

225

23 Sep 2025

computer-science robotics

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

Tianjin University Huawei Noah’s Ark Lab

Chinese Academy of Sciences

Imperial College London

Sun Yat-Sen University

University of Manchester

University College London Tongji University

King’s College London TU Darmstadt Pengcheng Laboratory Hong Kong University of Science and Technology (Guangzhou)

Nanjing University

Tsinghua University

Peking University

Shaojin Ma

Researchers from a global consortium, including Tianjin University and Huawei Noah’s Ark Lab, developed Embodied Arena, a comprehensive platform for evaluating Embodied AI agents, featuring a systematic capability taxonomy and an automated, LLM-driven data generation pipeline. This platform integrates over 22 benchmarks and 30 models, revealing that specialized embodied models often outperform general models on targeted tasks and identifying object and spatial perception as key performance bottlenecks.

2,285

16 Feb 2025