alphaXiv

History

Papers Benchmarks

SCITIX (SGP) TECH PTE. LTD.

209

28 Aug 2025

computer-science computational-engineering-finance-and-science

Chain-of-Alpha: Unleashing the Power of Large Language Models for Alpha Mining in Quantitative Trading

Zhejiang University Beijing University of Aeronautics and Astronautics SCITIX (SGP) TECH PTE. LTD.University of Illinois Urbana Champaign

The Chain-of-Alpha framework automates the discovery of interpretable, formulaic alpha factors for quantitative trading by employing a dual-chain architecture driven by Large Language Models. This system consistently outperformed both traditional and prior LLM-based methods on China A-share market data, yielding higher annualized returns (e.g., 0.1324 on CSI 500) and information ratios (e.g., 1.4178 on CSI 500) while demonstrating superior efficiency.

511

16 May 2025

attention-mechanisms computer-science computation-and-language

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Chinese Academy of Sciences

Sun Yat-Sen University

University of Oxford

University of Texas at Austin South China Normal University SCITIX (SGP) TECH PTE. LTD.

Zhang Zhenyu

Mask-Enhanced Autoregressive Prediction (MEAP) improves the in-context retrieval and long-context reasoning of decoder-only Large Language Models by integrating masked tokens into the standard next-token prediction objective. This method achieves significant data efficiency, outperforming standard next-token prediction by up to 3x on retrieval tasks, while maintaining computational efficiency and reducing contextual hallucinations.

272

18 Nov 2025

computer-science artificial-intelligence deep-reinforcement-learning

Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs

Chinese Academy of Sciences

University of Oxford The Hong Kong University of Science and Technology (Guangzhou)SCITIX (SGP) TECH PTE. LTD.

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost-effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.

24 Oct 2025

computer-science computation-and-language efficient-transformers

HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding

Peking University SCITIX (SGP) TECH PTE. LTD.

阳叶

The HeteroSpec framework, developed by researchers from Peking University and SCITIX, improves large language model inference by dynamically managing speculative decoding resources based on contextual predictability. It achieves an average 4.24x decoding speedup over vanilla autoregressive decoding and reduces target model verification tokens by up to 22.79% compared to state-of-the-art methods like EAGLE-3.

06 Jan 2025

computer-science artificial-intelligence computation-and-language

Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

Leiden University Fuzhou University SmartMore Chongqing University of Technology SCITIX (SGP) TECH PTE. LTD.

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks by understanding input information and predicting corresponding outputs. However, the internal mechanisms by which LLMs comprehend input and make effective predictions remain poorly understood. In this paper, we explore the working mechanism of LLMs in information processing from the perspective of Information Bottleneck Theory. We propose a non-training construction strategy to define a task space and identify the following key findings: (1) LLMs compress input information into specific task spaces (e.g., sentiment space, topic space) to facilitate task understanding; (2) they then extract and utilize relevant information from the task space at critical moments to generate accurate predictions. Based on these insights, we introduce two novel approaches: an Information Compression-based Context Learning (IC-ICL) and a Task-Space-guided Fine-Tuning (TS-FT). IC-ICL enhances reasoning performance and inference efficiency by compressing retrieved example information into the task space. TS-FT employs a space-guided loss to fine-tune LLMs, encouraging the learning of more effective compression and selection mechanisms. Experiments across multiple datasets validate the effectiveness of task space construction. Additionally, IC-ICL not only improves performance but also accelerates inference speed by over 40\%, while TS-FT achieves superior results with a minimal strategy adjustment.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode