alphaXiv

History

Papers Benchmarks

Hong Kong University of Science and Technology (Guangzhou)

2,045

20 Sep 2024

computer-science artificial-intelligence computation-and-language

ControlMath: Controllable Data Generation Promotes Math Generalist Models

Microsoft

HKUST Hong Kong University of Science and Technology (Guangzhou)China Telecom Cloud Computing Research Institute

ControlMath introduces a framework for generating diverse mathematical reasoning data by starting with controlled equations and employing an adaptive selection mechanism to identify challenging problems. This approach significantly enhances the generalization capabilities of large language models across various mathematical domains and improves training efficiency.

2,141

08 Oct 2024

computer-science machine-learning optimization-methods

Non-asymptotic Approximation Error Bounds of Parameterized Quantum Circuits

Wuhan University

National University of Singapore Hong Kong University of Science and Technology (Guangzhou)

This research from Wuhan University and the National University of Singapore provides the first explicit constructions of purely quantum Parameterized Quantum Circuits (PQCs) for multivariate polynomials and smooth functions. The work establishes non-asymptotic approximation error bounds, demonstrating that PQCs can achieve exponentially smaller model sizes and parameter counts compared to deep ReLU neural networks for certain high-dimensional smooth functions.

2,032

01 Dec 2024

cloud-computing computer-science cryptography-and-security

Preserving Privacy in Software Composition Analysis: A Study of Technical Solutions and Enhancements

HKUST Hong Kong University of Science and Technology (Guangzhou)Keen Security Lab, Tencent

Researchers from HKUST and Tencent's Keen Security Lab developed SAFESCA, an optimized Multi-Party Computation (MPC)-based framework for Software Composition Analysis (SCA) that cryptographically protects both client and vendor data. This framework achieves high accuracy and reduces the computational overhead of MPC-based SCA by 87% compared to naive implementations, making privacy-preserving SCA practical for real-world deployment.

1,009

06 Dec 2025

agentic-frameworks agents ai-for-cybersecurity

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

A comprehensive synthesis of Large Language Models for automated software development covers the entire model lifecycle, from data curation to autonomous agents, and offers practical guidance derived from empirical experiments on pre-training, fine-tuning, and reinforcement learning, alongside a detailed analysis of challenges and future directions.

717

13 Oct 2025

computer-science robotics

Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey

Chinese Academy of Sciences BAAI

Zhejiang University

Westlake University

Peking University

University of Sydney Xian Jiaotong University Hong Kong University of Science and Technology (Guangzhou)

This survey synthesizes the extensive and fragmented field of robot manipulation, providing a comprehensive overview that unifies diverse methodologies and challenges under novel classification systems. It structures the landscape by introducing new taxonomies for high-level planning, low-level learning-based control, and key bottlenecks, while outlining future research directions.

504

998

02 Sep 2025

chain-of-thought computer-science artificial-intelligence

Implicit Reasoning in Large Language Models: A Comprehensive Survey

The Chinese University of Hong Kong

Yale University Jilin University Hong Kong University of Science and Technology (Guangzhou)

A comprehensive survey organizes research on implicit reasoning in Large Language Models (LLMs) by proposing a new execution-centric taxonomy and consolidating evidence for how multi-step reasoning unfolds internally without generating explicit text. The work highlights critical challenges and future directions to foster more efficient and robust LLM reasoning systems.

13,205

25 Jun 2025

chain-of-thought computer-science artificial-intelligence

From System 1 to System 2: A Survey of Reasoning Large Language Models

South China University of Technology

Chinese Academy of Sciences

City University of Hong Kong

Mohamed bin Zayed University of Artificial Intelligence East China Normal University University of Strathclyde Hong Kong University of Science and Technology (Guangzhou)Xiaohongshu Inc.AiShiWeiLai AI Research

Zhiwei Li

This survey provides a comprehensive analysis of 'reasoning Large Language Models,' detailing their transition from intuitive 'System 1' to deliberate 'System 2' thinking. It maps the foundational technologies, core construction methods, and evaluation benchmarks, highlighting their enhanced performance in complex tasks like mathematics and coding while also identifying current limitations and future research directions.

37,609

09 Jun 2025

adversarial-attacks adversarial-robustness agents

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

University of Washington Wuhan University

University of Illinois at Urbana-Champaign

UCLA

Chinese Academy of Sciences Shanghai AI Laboratory

New York University

National University of Singapore

Fudan University

Georgia Institute of Technology

University of Science and Technology of China

Zhejiang University University of Electronic Science and Technology of China

Renmin University of China

The Hong Kong Polytechnic University

Peking University Griffith University

Nanyang Technological University

Johns Hopkins University

The University of Hong Kong

The Pennsylvania State University A*STAR Shanghai University University of Illinois at Chicago Singapore Management University

Southern University of Science and Technology

HKUST

Tencent TeleAI Squirrel Ai Learning Hong Kong University of Science and Technology (Guangzhou)The University of North Carolina at Chapel Hill Ben Gurion University Center for Applied Scientific Computing

Kun Wang

Fu An

This survey paper defines and applies a 'full-stack' safety concept for Large Language Models (LLMs), systematically analyzing safety concerns across their entire lifecycle from data to deployment and commercialization. The collaboration synthesizes findings from over 900 papers, providing a unified taxonomy of attacks and defenses while identifying key insights and future research directions for LLM and LLM-agent safety.

309

01 Dec 2025

computer-science computation-and-language

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

National University of Singapore Nanjing University of Science and Technology

Tsinghua University

Nanyang Technological University Hainan University Hong Kong University of Science and Technology (Guangzhou)

Researchers introduce Prompt-R1, an end-to-end reinforcement learning framework where a small language model agent learns to generate optimal prompts for a large language model environment. This approach yields consistent performance improvements, strong generalization to unseen data, and robust transferability across diverse large language models for complex tasks.

289

24 Nov 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Shanghai Artificial Intelligence Laboratory

National University of Singapore

University of Oxford

Tsinghua University

Renmin University of China Xiamen University Hong Kong University of Science and Technology (Guangzhou)DeepWisdom

VR-Bench, a new benchmark, is introduced to evaluate the spatial reasoning capabilities of video generation models through diverse maze-solving tasks. The paper demonstrates that fine-tuned video models can perform robust spatial reasoning, often outperforming Vision-Language Models, and exhibit strong generalization and a notable test-time scaling effect.

1,561

15 Nov 2025

computer-science artificial-intelligence robotics

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping

Peking University Hong Kong University of Science and Technology (Guangzhou)

DexGraspVLA, a vision-language-action framework from Peking University, achieves an unprecedented 90.8% aggregated success rate for dexterous grasping in cluttered scenes across 1,287 unseen object, lighting, and background combinations. The framework leverages foundation models to iteratively transform diverse language and visual inputs into domain-invariant representations, enabling robust closed-loop control and strong zero-shot generalization for language-guided manipulation.

1,416

14 Mar 2025

agents chain-of-thought computer-science

LLM Agents for Education: Advances and Applications

Fudan University

Tsinghua University University of Illinois Chicago Squirrel Ai Learning Hong Kong University of Science and Technology (Guangzhou)

Henry Ye

This paper systematically reviews state-of-the-art research on Large Language Model (LLM) agents in education, providing a task-centric taxonomy and examining their technological foundations, applications, and challenges. It details how these agents offer personalized and adaptive support for learning by categorizing them into pedagogical and domain-specific types.

238

26 Sep 2025

computer-science computation-and-language

Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval

Hong Kong University of Science and Technology (Guangzhou)IDEA Research, International Digital Economy Academy DataArc Tech Ltd.

Think-on-Graph 3.0 (ToG-3) introduces a Retrieval-Augmented Generation (RAG) framework that leverages a heterogeneous graph and a Multi-Agent Context Evolution and Retrieval (MACER) mechanism. This approach enables efficient and adaptive LLM reasoning, achieving state-of-the-art performance on complex tasks even with lightweight language models by dynamically refining the knowledge context.

3,493

18 Jun 2024

computer-science artificial-intelligence information-retrieval

A Survey on Large Language Models for Recommendation

University of Science and Technology of China Hong Kong University of Science and Technology (Guangzhou)Career Science Lab, BOSS Zhipin

Researchers from USTC, BOSS Zhipin, and HKUST present a comprehensive survey of Large Language Models (LLMs) in recommendation systems, introducing a new taxonomy that differentiates between discriminative and generative LLM paradigms, particularly focusing on the emerging field of Generative LLMs for Recommendation (GLLM4Rec), while outlining key challenges and future research directions.

724

27 Sep 2025

agents computer-science computation-and-language

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

UCLA

Microsoft

HKUST Hong Kong University of Science and Technology (Guangzhou)School of Artificial Intelligence, Chinese Academy of Sciences

Self-play with Variational problem Synthesis (SVS), an online strategy for Reinforcement Learning with Verifiable Rewards (RLVR), enables Large Language Models to continuously generate and solve diverse, new problems, addressing policy entropy collapse. This method achieved absolute gains of 18.3% and 22.8% in Pass@32 on AIME 24 and AIME 25 benchmarks, and consistently improved Pass@k performance and generalization across various models and tasks.

629

20 Nov 2024

computer-science computer-vision-and-pattern-recognition

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Shanghai Artificial Intelligence Laboratory

Nanjing University

The Chinese University of Hong Kong

Nanyang Technological University Hong Kong University of Science and Technology (Guangzhou)

Qianli Ma

A comprehensive benchmark for video generation models, VBench++ from S-Lab at Nanyang Technological University and Shanghai Artificial Intelligence Laboratory evaluates performance across 16 distinct dimensions, encompassing text-to-video, image-to-video, and trustworthiness aspects. It provides granular insights into model capabilities and limitations, such as the trade-off between temporal consistency and motion, and challenges in compositional understanding.

1,252

894

25 Aug 2025

computer-science artificial-intelligence computation-and-language

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey

Hong Kong University of Science and Technology (Guangzhou)

Researchers from HKUST (Guangzhou) and University of Surrey conducted a systematic survey on controllable speech synthesis (TTS) methods, focusing on the impact of Large Language Models (LLMs) and diffusion models. The survey provides a comprehensive taxonomy of architectures and control strategies, identifies current challenges like fine-grained control, and proposes future research directions.

119

228

20 Oct 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

ConsistEdit: Highly Consistent and Precise Training-free Visual Editing

Tsinghua University

HKUST International Digital Economy Academy Hong Kong University of Science and Technology (Guangzhou)

ConsistEdit introduces the first training-free attention control method specifically for Multi-Modal Diffusion Transformers (MM-DiT), enabling highly consistent and precise text-guided visual editing across both images and videos. The method achieves state-of-the-art performance by allowing strong, prompt-aligned edits while preserving structural integrity and content fidelity in unedited regions, alongside fine-grained control over consistency.

225

23 Sep 2025

computer-science robotics

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

Tianjin University Huawei Noah’s Ark Lab

Chinese Academy of Sciences

Imperial College London

Sun Yat-Sen University

University of Manchester

University College London Tongji University

Shanghai Jiao Tong University

Nanjing University

Tsinghua University

Peking University

King’s College London TU Darmstadt Pengcheng Laboratory Hong Kong University of Science and Technology (Guangzhou)

Shaojin Ma

Researchers from a global consortium, including Tianjin University and Huawei Noah’s Ark Lab, developed Embodied Arena, a comprehensive platform for evaluating Embodied AI agents, featuring a systematic capability taxonomy and an automated, LLM-driven data generation pipeline. This platform integrates over 22 benchmarks and 30 models, revealing that specialized embodied models often outperform general models on targeted tasks and identifying object and spatial perception as key performance bottlenecks.

781

01 May 2025

agents chain-of-thought computer-science

BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

Alibaba Group

Zhejiang University

Peking University MBZUAI Zhejiang University of Technology Harvard T.H. Chan School of Public Health Hong Kong University of Science and Technology (Guangzhou)NIO HSBC Mindverse AI

BrowseComp-ZH introduces the first comprehensive benchmark for evaluating large language models' web browsing and reasoning capabilities in the Chinese information environment. The benchmark reveals consistently low performance across models and underscores the unique challenges of effectively integrating and reconciling retrieved information from the complex Chinese web.

115

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

ControlMath: Controllable Data Generation Promotes Math Generalist Models

Non-asymptotic Approximation Error Bounds of Parameterized Quantum Circuits

Preserving Privacy in Software Composition Analysis: A Study of Technical Solutions and Enhancements

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey

Implicit Reasoning in Large Language Models: A Comprehensive Survey

From System 1 to System 2: A Survey of Reasoning Large Language Models

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping

LLM Agents for Education: Advances and Applications

Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval

A Survey on Large Language Models for Recommendation

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey

ConsistEdit: Highly Consistent and Precise Training-free Visual Editing

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

Events

AI for Law

Personalize Your Feed