alphaXiv

History

Papers Benchmarks

The Hong Kong University of Science and Technology (Guangzhou)

7,363

01 Aug 2025

agentic-frameworks agents ai-for-health

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Princeton AI Lab

University of Illinois at Urbana-Champaign

University of California, Santa Barbara

Carnegie Mellon University

Fudan University

Shanghai Jiao Tong University

Tsinghua University

University of Michigan

The Chinese University of Hong Kong The Hong Kong University of Science and Technology (Guangzhou)

University of California, San Diego Pennsylvania State University

The University of Hong Kong

Princeton University

University of Sydney Oregon State University

An extensive international collaboration offers the first systematic review of self-evolving agents, establishing a unified theoretical framework categorized by 'what to evolve,' 'when to evolve,' and 'how to evolve'. The work consolidates diverse research, highlights key challenges, and maps applications, aiming to guide the development of AI systems capable of continuous autonomous improvement.

511

2,090

10 Oct 2024

computer-science artificial-intelligence computation-and-language

Are Large Language Models Good Statisticians?

The Hong Kong University of Science and Technology (Guangzhou)

HKUST

This research evaluates Large Language Models' (LLMs) proficiency in specialized statistical tasks, specifically their ability to assess the applicability of statistical methods. The paper introduces StatQA, a novel benchmark, and finds that fine-tuned LLMs achieve the highest accuracy at 77.13% on this benchmark, surpassing general-purpose LLMs and human experts.

1,552

22 Sep 2025

computer-science robotics

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Zhejiang University

Westlake University The Hong Kong University of Science and Technology (Guangzhou)Beijing University of Posts and Telecommunications State Key Laboratory of Networking and Switching Technology OpenHelix Team

Researchers from Beijing University of Posts and Telecommunications, Westlake University, and Zhejiang University, along with the OpenHelix Team, introduce VLA-Adapter, an efficient method to bridge vision-language representations to robotic actions. The approach enables state-of-the-art level performance with a tiny-scale 0.5B parameter backbone without robotic data pre-training, achieving a 97.3% average success rate on the LIBERO benchmark and providing a 3x faster inference speed (219.2 Hz) than comparable methods.

578

7,708

15 Apr 2025

agent-based-systems computer-science artificial-intelligence

AFlow: Automating Agentic Workflow Generation

Fudan University

Nanjing University

Renmin University of China The Hong Kong University of Science and Technology (Guangzhou)

HKUST King Abdullah University of Science and Technology DeepWisdom Universit ́e de Montr ́eal & Mila

Xiong-Hui Chen

Bang Liu

AFLOW introduces an automated framework for generating and optimizing agentic workflows for Large Language Models, reformulating workflow optimization as a search problem over code-represented workflows. The system leverages Monte Carlo Tree Search with LLM-based optimization to iteratively refine workflows, yielding a 19.5% average performance improvement over existing automated methods while enabling smaller, more cost-effective LLMs to achieve performance parity with larger models.

130

22,905

11 Jul 2022

computer-science contrastive-learning computer-vision-and-pattern-recognition

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Tsinghua University The Hong Kong University of Science and Technology (Guangzhou)

HKUST

International Digital Economy Academy (IDEA)

DINO, a Transformer-based end-to-end object detector, integrates improved denoising training, mixed query selection, and a 'look forward twice' scheme to significantly advance performance. The model achieved a state-of-the-art 63.2 AP on COCO val2017 using a SwinL backbone, outperforming previous highly optimized classical detectors.

2,387

4,794

29 Jan 2024

computer-science artificial-intelligence machine-learning

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Monash University

Alibaba Group Ant Group The Hong Kong University of Science and Technology (Guangzhou)Griffith University IBM Research

TIME-LLM introduces a reprogramming framework that adapts large language models for general time series forecasting by keeping the LLM backbone frozen. The approach achieves state-of-the-art performance across various benchmarks, excelling particularly in data-scarce few-shot and zero-shot settings.

1,727

6,736

09 Apr 2025

computer-science computation-and-language machine-learning

A Survey on Mixture of Experts in Large Language Models

The Hong Kong University of Science and Technology (Guangzhou)

This survey paper provides a comprehensive review of Mixture of Experts (MoE) in Large Language Models (LLMs), addressing the computational challenges of scaling and the lack of an up-to-date overview. It presents a novel taxonomy for MoE advancements, detailing key findings across algorithmic design, system optimizations, and diverse applications like NLP, CV, and multimodal systems.

243

1,283

21 Nov 2025

computer-science computer-vision-and-pattern-recognition neural-and-evolutionary-computing

TDSNNs: Competitive Topographic Deep Spiking Neural Networks for Visual Cortex Modeling

The Hong Kong University of Science and Technology (Guangzhou)

The primate visual cortex exhibits topographic organization, where functionally similar neurons are spatially clustered, a structure widely believed to enhance neural processing efficiency. While prior works have demonstrated that conventional deep ANNs can develop topographic representations, these models largely neglect crucial temporal dynamics. This oversight often leads to significant performance degradation in tasks like object recognition and compromises their biological fidelity. To address this, we leverage spiking neural networks (SNNs), which inherently capture spike-based temporal dynamics and offer enhanced biological plausibility. We propose a novel Spatio-Temporal Constraints (STC) loss function for topographic deep spiking neural networks (TDSNNs), successfully replicating the hierarchical spatial functional organization observed in the primate visual cortex from low-level sensory input to high-level abstract representations. Our results show that STC effectively generates representative topographic features across simulated visual cortical areas. While introducing topography typically leads to significant performance degradation in ANNs, our spiking architecture exhibits a remarkably small performance drop (No drop in ImageNet top-1 accuracy, compared to a 3% drop observed in TopoNet, which is the best-performing topographic ANN so far) and outperforms topographic ANNs in brain-likeness. We also reveal that topographic organization facilitates efficient and stable temporal information processing via the spike mechanism in TDSNNs, contributing to model robustness. These findings suggest that TDSNNs offer a compelling balance between computational performance and brain-like features, providing not only a framework for interpreting neural science phenomena but also novel insights for designing more efficient and robust deep learning models.

479

31 Aug 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

SegDINO: An Efficient Design for Medical and Natural Image Segmentation with DINO-V3

The Hong Kong University of Science and Technology (Guangzhou)

HKUST

SegDINO introduces an efficient segmentation framework that leverages a frozen DINOv3 Vision Transformer backbone and an uncommonly lightweight, MLP-based decoder to achieve state-of-the-art performance across diverse medical and natural image segmentation tasks with significantly reduced parameter counts and high inference speed. The framework demonstrates improved Dice scores and IoU on datasets like TN3K, Kvasir-SEG, and MSD, while maintaining high efficiency at only 2.21 million trainable parameters and 53 FPS.

103

935

05 Sep 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception

Tsinghua University The Hong Kong University of Science and Technology (Guangzhou)Shenzhen University

HKUST Taiyuan University of Technology Xian Jiaotong University Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)Beĳing Institute of Technology

YOLOv13 enhances real-time object detection by integrating an adaptive hypergraph computation mechanism for high-order visual correlation modeling and a full-pipeline feature distribution paradigm. The approach yields improved detection accuracy on the MS COCO benchmark, with the Nano variant achieving a 1.5% mAP@50:95 increase over YOLOv12-N, while maintaining or reducing computational cost.

3,544

20 Sep 2025

computer-science conversational-ai artificial-intelligence

A Survey of Personalized Large Language Models: Progress and Future Directions

The Chinese University of Hong Kong The Hong Kong University of Science and Technology (Guangzhou)Huawei Technologies Co Ltd

This survey paper, from researchers at CUHK, Huawei, HKUST (Guangzhou), and NUS, systematically reviews Personalized Large Language Models (PLLMs), proposing a three-level technical taxonomy and articulating a trilemma among personalization efficacy, computational efficiency, and user privacy. It consolidates existing methods and outlines future research directions for building user-specific AI systems.

321

25 Nov 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction

The Hong Kong University of Science and Technology (Guangzhou)Horizon Robotics

Researchers from The Hong Kong University of Science and Technology (Guangzhou) and Horizon Robotics introduced VGGT4D, a training-free framework that extends the VGGT 3D foundation model to perform robust 4D scene reconstruction by mining implicit motion cues from its global attention layers. This approach achieves state-of-the-art results across various dynamic scene benchmarks for object segmentation, camera pose, and 4D point cloud reconstruction, demonstrating superior performance on long sequences.

723

27 Sep 2025

agents chain-of-thought computer-science

TreeRPO: Tree Relative Policy Optimization

ETH Zurich

Sun Yat-Sen University The Hong Kong University of Science and Technology (Guangzhou)MBZUAI University of California, Merced

HKUST

Zhicheng Yang

TREERPO enhances Large Language Model reasoning by employing a novel tree sampling mechanism to generate fine-grained, step-level reward signals without requiring a separate process reward model. This method improves Pass@1 accuracy by up to 16.5% for Qwen2.5-Math-1.5B and reduces average response length by 18.1% compared to GRPO.

296

15 Sep 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching

National University of Singapore

Shanghai Jiao Tong University

Tsinghua University University of Electronic Science and Technology of China The Hong Kong University of Science and Technology (Guangzhou)

Shandong University

SpeCa introduces a "Forecast-then-verify" acceleration framework for Diffusion Transformers, adapting principles from speculative decoding to reduce computational cost while preserving generation quality. This method achieves up to 6.34x acceleration on FLUX.1-dev with minimal quality degradation and 6.16x on HunyuanVideo, consistently outperforming existing approaches.

2,871

05 Dec 2025

computer-science databases

A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

Tsinghua University

Renmin University of China The Hong Kong University of Science and Technology (Guangzhou)

Translating users' natural language queries (NL) into SQL queries (i.e., Text-to-SQL, a.k.a. NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. Moreover, we offer a rule of thumb for developing Text-to-SQL solutions. Finally, we discuss the research challenges and open problems of Text-to-SQL in the LLMs era. Text-to-SQL Handbook: this https URL Handbook

1,033

587

02 Jun 2025

ai-for-health causal-inference computer-science

Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

The Hong Kong University of Science and Technology (Guangzhou)

Johns Hopkins University

HKUST University of California San Francisco The First Affiliated Hospital of Nanjing Medical University

Researchers from Johns Hopkins University, HKUST, and medical institutions develop Medical World Model (MeWM), the first medical AI system to simulate tumor evolution by generating realistic post-treatment CT scans and optimizing interventional protocols, achieving 52.38% F1-score in treatment selection (outperforming GPT-4o's 41.97%) and demonstrating 13% improvement in physician decision-making for hepatocellular carcinoma TACE procedures through a novel framework combining vision-language policy models with diffusion-based dynamics simulation and survival analysis, validated by radiologists who mistook 25-29% of synthetic post-treatment tumors for real images while the system's risk stratification achieved 0.752 c-index compared to traditional Cox model's 0.472, establishing a new paradigm for visually-grounded predictive medicine that moves beyond static diagnosis toward dynamic treatment outcome simulation.

2,030

18 Nov 2025

computer-science artificial-intelligence computation-and-language

MoM: Linear Sequence Modeling with Mixture-of-Memories

South China University of Technology Shanghai AI Laboratory

Nanjing University

The Chinese University of Hong Kong The Hong Kong University of Science and Technology (Guangzhou)

The Mixture-of-Memories (MoM) architecture replaces a single recurrent state with multiple, independent memory states and a routing mechanism, enhancing linear sequence models' ability to retain information over long sequences. This design enables performance on recall-intensive tasks comparable to Transformer models while maintaining linear time complexity during training and constant time inference.

8,509

10 Nov 2024

computer-science artificial-intelligence computation-and-language

A Survey on Large Language Models for Code Generation

The Hong Kong University of Science and Technology (Guangzhou)NAVER Cloud

HKUST

This comprehensive survey addresses the critical void of an up-to-date literature review specifically for Large Language Models in natural language to code generation, providing a systematic categorization of advancements and an empirical comparison of leading models. It highlights the narrowing performance gap between open and closed-source models and underscores the importance of instruction tuning and ethical alignment for practical applications.

1,211

13 Apr 2025

agents chain-of-thought computer-science

HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation

Shanghai Artificial Intelligence Laboratory The Hong Kong University of Science and Technology (Guangzhou)

HKUST

Ding Wang

火龙

HM-RAG presents a hierarchical multi-agent multimodal Retrieval-Augmented Generation framework designed to integrate knowledge from text, graph, and web sources. This framework achieved state-of-the-art accuracy of 93.73% on the ScienceQA benchmark and 58.55% on CrisisMMD, demonstrating robust multimodal reasoning and an ability to outperform previous models and human experts on specific tasks.

1,865

19 May 2025

attention-mechanisms computer-science artificial-intelligence

TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

Zhejiang University The Hong Kong University of Science and Technology (Guangzhou)Griffith University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

TIMEMIXER++, developed by researchers from Griffith University, Zhejiang University, and MIT, presents a general-purpose time series pattern machine capable of dynamically capturing patterns across multiple temporal scales and frequency resolutions. The model consistently achieves state-of-the-art performance across 8 diverse time series tasks, including long-term forecasting (reducing MSE on Electricity by 7.3%), imputation (outperforming TimesNet by 25.7% in MSE), and zero-shot forecasting (reducing MSE by 13.1%).

1,513

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Are Large Language Models Good Statisticians?

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

AFlow: Automating Agentic Workflow Generation

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

A Survey on Mixture of Experts in Large Language Models

TDSNNs: Competitive Topographic Deep Spiking Neural Networks for Visual Cortex Modeling

SegDINO: An Efficient Design for Medical and Natural Image Segmentation with DINO-V3

YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception

A Survey of Personalized Large Language Models: Progress and Future Directions

VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction

TreeRPO: Tree Relative Policy Optimization

SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching

A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

MoM: Linear Sequence Modeling with Mixture-of-Memories

A Survey on Large Language Models for Code Generation

HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation

TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

Events

AI for Law

Personalize Your Feed