alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

National Key Laboratory of Smart Farm Technologies and SystemsHarbin Institute of Technology

A Survey of Reinforcement Learning for Large Reasoning Models

09 Oct 2025

University of Washington Shanghai AI Laboratory

This survey paper systematically synthesizes advancements in Reinforcement Learning (RL) for Large Reasoning Models (LRMs), moving beyond human alignment to focus on enhancing intrinsic reasoning capabilities through verifiable rewards. It identifies key components, challenges, and future directions for scaling RL towards Artificial SuperIntelligence (ASI).

#computer-science #artificial-intelligence #computation-and-language

Resources 1,595

Paper thumbnail

CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks

21 Aug 2025

National University of Singapore Beihang University logo

Beihang University

CopyrightShield, developed by researchers from Nanyang Technological University and Beihang University, establishes a defense framework to protect diffusion models from copyright infringement attacks by detecting poisoned training samples and mitigating their influence. The approach achieves an F1-score of 0.665 for poisoned sample detection, which is a 25% improvement over prior attribution methods, and reduces the copyright infringement rate by 56.7% while delaying attack initiation by 115.2%, all without compromising generative quality.

#adversarial-robustness #computer-science #artificial-intelligence

Paper thumbnail

Two-Person Adversarial Games are Zero-Sum: An Elaboration of a Folk Theorem

31 Jul 2024

Harbin Institute of Technology City University of New York

The observation that every two-person adversarial game is an affine transformation of a zero-sum game is traceable to Luce & Raiffa (1957) and made explicit in Aumann (1987). Recent work of (ADP) Adler et al. (2009), and of Raimondo (2023) in increasing generality, proves what has so far remained a conjecture. We present two proofs of an even more general formulation: the first draws on multilinear utility theory developed by Fishburn & Roberts (1978); the second is a consequence of the ADP proof itself for a special case of a two-player game with a set of three actions.

#economics #theoretical-economics

Paper thumbnail

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

06 Nov 2025

Fudan University

The Chinese University of Hong Kong

This research introduces "Thinking with Video," a new paradigm that leverages video generation for multimodal reasoning by enabling dynamic visualization and human-like imagination in problem-solving. It evaluates frontier video models like Sora-2 on a new, comprehensive benchmark, VideoThinkBench, showcasing their unexpected capabilities across vision and text-centric tasks.

#computer-science #computation-and-language #computer-vision-and-pattern-recognition

Paper thumbnail

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

06 Dec 2025

Monash University CSIRO

A comprehensive synthesis of Large Language Models for automated software development covers the entire model lifecycle, from data curation to autonomous agents, and offers practical guidance derived from empirical experiments on pre-training, fine-tuning, and reinforcement learning, alongside a detailed analysis of challenges and future directions.

#agentic-frameworks #agents #ai-for-cybersecurity

Paper thumbnail

Molecular Bubble and Outflow in S Mon Revealed by Multiband Datasets

31 Jan 2024

Chinese Academy of Sciences

University of Science and Technology of China

We identify a molecular bubble, and study the star formation and its feedback in the S Mon region, using multiple molecular lines, young stellar objects (YSOs), and infrared data. We revisit the distance to S Mon, ~722+/-9 pc, using Gaia Data Release 3 parallaxes of the associated Class II YSOs. The bubble may be mainly driven by a massive binary system (namely 15 Mon), the primary of which is an O7V-type star. An outflow is detected in the shell of the bubble, suggesting ongoing star formation activities in the vicinity of the bubble. The total wind energy of the massive binary star is three orders of magnitude higher than the sum of the observed turbulent energy in the molecular gas and the kinetic energy of the bubble, indicating that stellar winds help to maintain the turbulence in the S Mon region and drive the bubble. We conclude that the stellar winds of massive stars have an impact on their surrounding environment.

#astrophysics-of-galaxies #physics

Paper thumbnail

RLPR: Extrapolating RLVR to General Domains without Verifiers

23 Jun 2025

University of Illinois at Urbana-Champaign

National University of Singapore

Researchers from Tsinghua University and NUS developed RLPR, a verifier-free reinforcement learning framework that enhances Large Language Model reasoning across general domains by using an intrinsic probability-based reward. This method achieved an average 24.9% improvement on general-domain benchmarks and consistently outperformed existing RLVR and concurrent verifier-free approaches by removing the need for external verification.

#agents #computer-science #artificial-intelligence

Paper thumbnail

MemoryBank: Enhancing Large Language Models with Long-Term Memory

21 May 2023

wanjun-zhong

Wanjun Zhong

Sun Yat-Sen University Harbin Institute of Technology

MemoryBank introduces a novel long-term memory mechanism for Large Language Models, enabling them to retain and recall information across extended interactions by simulating human-like forgetting and reinforcement. The system, demonstrated through the SiliconFriend chatbot, significantly enhances contextual understanding, personalizes user interactions through dynamic user portraits, and provides empathetic responses, showing strong performance across various LLMs in both qualitative and quantitative evaluations.

#computer-science #conversational-ai #artificial-intelligence

Paper thumbnail

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

18 Jul 2025

Fudan University Central South University

Researchers from Harbin Institute of Technology and collaborating institutions provide a systematic survey of Long Chain-of-Thought (Long CoT) in Large Language Models, establishing a formal distinction from Short CoT. The survey proposes a novel taxonomy based on deep reasoning, extensive exploration, and feasible reflection, and analyzes key phenomena observed in advanced reasoning models.

#agents #chain-of-thought #computer-science

Paper thumbnail

AI4Research: A Survey of Artificial Intelligence for Scientific Research

05 Aug 2025

jiaqi-wang920

jiaqi wang

Fudan University Central South University

Researchers from Harbin Institute of Technology and collaborators present a systematic survey of Artificial Intelligence for Scientific Research (AI4Research), defining its scope, proposing a comprehensive taxonomy across the entire research lifecycle, and identifying critical future directions. The study clarifies the distinction between AI4Research and AI4Science, demonstrating AI's growing capabilities from scientific comprehension to peer review, while highlighting significant challenges in achieving ethical, explainable, and fully autonomous systems.

#agentic-frameworks #agents #computer-science

Paper thumbnail

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

08 Oct 2025

Chinese Academy of Sciences UC Berkeley logo

RLinf-VLA presents a unified and efficient framework for training Vision-Language-Action (VLA) models with reinforcement learning (RL), achieving up to 2.27x speedup and establishing new performance benchmarks, including a 98.11% success rate on LIBERO-130 and improved real-world zero-shot generalization over supervised methods.

#computer-science #robotics

Paper thumbnail

MedResearcher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework

01 Sep 2025

Ant Group Harbin Institute of Technology

Ant Group researchers developed MedResearcher-R1, an expert-level medical deep research agent capable of complex, multi-hop reasoning over medical information using specialized tools. It achieved a new state-of-the-art pass@1 score of 27.5/50 on the MedBrowseComp benchmark, outperforming leading proprietary systems, while maintaining competitive performance on general deep research tasks.

#agentic-frameworks #agents #ai-for-health

Paper thumbnail

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

05 Apr 2025

kaiyan-zhang

Kaiyan Zhang

jian-zhao

Jian Zhao

Shanghai AI Laboratory Tsinghua University logo

Tsinghua University

GenPRM, from researchers at Tsinghua University and Shanghai AI Laboratory, introduces a generative approach to Process Reward Models (PRMs) that produces explicit Chain-of-Thought reasoning and integrates code verification, enabling the PRM itself to benefit from test-time scaling. This allows GenPRM to outperform prior classification-based PRMs and serve as an effective verifier and critic for policy models in mathematical reasoning.

#chain-of-thought #computer-science #computation-and-language

Paper thumbnail

AutoPR: Let's Automate Your Academic Promotion!

15 Oct 2025

ByteDance Central South University

As the volume of peer-reviewed research surges, scholars increasingly rely on social platforms for discovery, while authors invest considerable effort in promoting their work to ensure visibility and citations. To streamline this process and reduce the reliance on human effort, we introduce Automatic Promotion (AutoPR), a novel task that transforms research papers into accurate, engaging, and timely public content. To enable rigorous evaluation, we release PRBench, a multimodal benchmark that links 512 peer-reviewed articles to high-quality promotional posts, assessing systems along three axes: Fidelity (accuracy and tone), Engagement (audience targeting and appeal), and Alignment (timing and channel optimization). We also introduce PRAgent, a multi-agent framework that automates AutoPR in three stages: content extraction with multimodal preparation, collaborative synthesis for polished outputs, and platform-specific adaptation to optimize norms, tone, and tagging for maximum reach. When compared to direct LLM pipelines on PRBench, PRAgent demonstrates substantial improvements, including a 604% increase in total watch time, a 438% rise in likes, and at least a 2.9x boost in overall engagement. Ablation studies show that platform modeling and targeted promotion contribute the most to these gains. Our results position AutoPR as a tractable, measurable research problem and provide a roadmap for scalable, impactful automated scholarly communication.

#agentic-frameworks #agents #computer-science

Paper thumbnail

Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought

26 Oct 2025

jiaqi-wang920

jiaqi wang

Shanghai AI Laboratory

National University of Singapore

Large Vision-Language Models (LVLMs) have achieved significant success in multimodal tasks, with multimodal chain-of-thought (MCoT) further enhancing performance and interpretability. Recent MCoT methods fall into two categories: (i) Textual-MCoT (T-MCoT), which takes multimodal input and produces textual output; and (ii) Interleaved-MCoT (I-MCoT), which generates interleaved image-text outputs. Despite advances in both approaches, the mechanisms driving these improvements are not fully understood. To fill this gap, we first reveal that MCoT boosts LVLMs by incorporating visual thoughts, which convey image information to the reasoning process regardless of the MCoT format, depending only on clarity and conciseness of expression. Furthermore, to explore visual thoughts systematically, we define four distinct forms of visual thought expressions and analyze them comprehensively. Our findings demonstrate that these forms differ in clarity and conciseness, yielding varying levels of MCoT improvement. Additionally, we explore the internal nature of visual thoughts, finding that visual thoughts serve as intermediaries between the input image and reasoning to deeper transformer layers, enabling more advanced visual information transmission. We hope that the visual thoughts can inspire further breakthroughs for future MCoT research.

#chain-of-thought #computer-science #computation-and-language

Paper thumbnail

CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation

01 Dec 2025

Harbin Institute of Technology Pengcheng Laboratory

As Large Language Models (LLMs) are increasingly popularized in the multilingual world, ensuring hallucination-free factuality becomes markedly crucial. However, existing benchmarks for evaluating the reliability of Multimodal Large Language Models (MLLMs) predominantly focus on textual or visual modalities with a primary emphasis on English, which creates a gap in evaluation when processing multilingual input, especially in speech. To bridge this gap, we propose a novel Cross-lingual and Cross-modal Factuality benchmark (CCFQA). Specifically, the CCFQA benchmark contains parallel speech-text factual questions across 8 languages, designed to systematically evaluate MLLMs' cross-lingual and cross-modal factuality capabilities. Our experimental results demonstrate that current MLLMs still face substantial challenges on the CCFQA benchmark. Furthermore, we propose a few-shot transfer learning strategy that effectively transfers the Question Answering (QA) capabilities of LLMs in English to multilingual Spoken Question Answering (SQA) tasks, achieving competitive performance with GPT-4o-mini-Audio using just 5-shot training. We release CCFQA as a foundational research resource to promote the development of MLLMs with more robust and reliable speech understanding capabilities. Our code and dataset are available at this https URL.

#computer-science #computation-and-language #few-shot-learning

Paper thumbnail

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

22 Aug 2025

Shanghai Artificial Intelligence Laboratory Tsinghua University logo

Tsinghua University

MeshCoder reconstructs complex 3D objects from point clouds into editable Blender Python scripts, achieving an L2 Chamfer Distance of 0.06 (x10^-2) while enabling precise geometric and topological editing. This method allows large language models to better reason about 3D shapes through semantically rich code, substantially outperforming existing shape-to-code baselines.

#computer-science #computer-vision-and-pattern-recognition #graphics

Paper thumbnail

A Survey on Video Temporal Grounding with Multimodal Large Language Model

07 Aug 2025

Harbin Institute of Technology

The Hong Kong Polytechnic University

This paper comprehensively surveys Video Temporal Grounding with Multimodal Large Language Models (VTG-MLLMs), presenting a novel three-dimensional taxonomy to classify methodologies and analyzing performance across diverse tasks and benchmarks. It provides a structured overview of architectural integrations, training strategies, and video feature processing techniques, consolidating advancements in the field.

#computer-science #computer-vision-and-pattern-recognition #multi-modal-learning

Paper thumbnail

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

11 Jun 2025

Shanghai Jiao Tong University University of Electronic Science and Technology of China

EfficientVLA presents a training-free framework that accelerates and compresses diffusion-based Vision-Language-Action (VLA) models by systematically addressing redundancies in the language module, visual processing, and iterative action head. The approach achieved a 1.93x inference speedup and over 70% FLOPs reduction with only a 0.6% drop in success rate, enabling more practical deployment on robotic platforms.

#agents #computer-science #computer-vision-and-pattern-recognition

Paper thumbnail

Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use

16 Sep 2025

Huawei Noah’s Ark Lab Harbin Institute of Technology

Tool-R1, developed by Harbin Institute of Technology and Huawei Noah’s Ark Lab, introduces a reinforcement learning framework that enables Large Language Models to use external tools by generating executable Python code, achieving sample-efficient training through dynamic data management and outcome-driven rewards. This framework elevates agentic capabilities, reaching an Answer Accuracy of 26.67% on the GAIA benchmark, outperforming baselines with significantly less training data.

#agentic-frameworks #agents #chain-of-thought

Paper thumbnail

There are no more papers matching your filters at the moment.