alphaXiv

History

Papers Benchmarks

University of Manchester

2,870

17 Nov 2025

chain-of-thought computer-science computation-and-language

Scaling Latent Reasoning via Looped Language Models

Carnegie Mellon University

University of Manchester

Mila - Quebec AI Institute

ByteDance

Peking University

University of Pennsylvania

Princeton University University of Montreal

University of California, Santa Cruz Conscium M-A-P

Ouro, a family of Looped Language Models (LoopLMs), embeds iterative computation directly into the pre-training process through parameter reuse, leading to enhanced parameter efficiency and reasoning abilities. These models achieve the performance of much larger non-looped Transformers while demonstrating improved safety and a more causally faithful internal reasoning process.

1,009

06 Dec 2025

agentic-frameworks agents ai-for-cybersecurity

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

A comprehensive synthesis of Large Language Models for automated software development covers the entire model lifecycle, from data curation to autonomous agents, and offers practical guidance derived from empirical experiments on pre-training, fine-tuning, and reinforcement learning, alongside a detailed analysis of challenges and future directions.

3,408

10 Jul 2025

chain-of-thought computer-science computation-and-language

A Survey on Latent Reasoning

University of Manchester

Fudan University

Nanjing University

Renmin University of China

Peking University

University of Wisconsin-Madison Hong Kong Polytechnic University

University of California, Santa Cruz M-A-P

This comprehensive survey from a large multi-institutional collaboration examines "Latent Reasoning" in Large Language Models, an emerging paradigm that performs multi-step inference entirely within the model's high-bandwidth continuous hidden states to overcome the limitations of natural language-based explicit reasoning. It highlights the significant bandwidth advantage of latent representations (approximately 2700x higher) and provides a unified taxonomy of current methodologies.

238

932

20 Aug 2025

agent-based-systems computer-science artificial-intelligence

aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

University of Toronto Max Planck Institute for Intelligent Systems University of Utah

UCLA

University of Manchester

National University of Singapore

University of Oxford

Tsinghua University

Zhejiang University

The Chinese University of Hong Kong

Westlake University University of Electronic Science and Technology of China

University of California, San Diego

Peking University

Columbia University

University of Sydney Universit`a degli Studi di Genova Istituto Italiano di Tecnologia University of Birmingham

Researchers at the University of Toronto, Westlake University, and the University of Electronic Science and Technology of China, along with a global consortium, developed aiXiv, an open-access ecosystem designed for AI-generated scientific content and human-AI collaboration. This platform, featuring a multi-agent review system and iterative refinement, raised the acceptance rate of AI-generated proposals from 0% to 45.2% and papers from 10% to 70% in multi-AI voting, demonstrating enhanced quality and trustworthiness.

225

23 Sep 2025

computer-science robotics

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

Tianjin University Huawei Noah’s Ark Lab

Chinese Academy of Sciences

Imperial College London

Sun Yat-Sen University

University of Manchester

University College London Tongji University

Shanghai Jiao Tong University

Nanjing University

Tsinghua University

Peking University

King’s College London TU Darmstadt Pengcheng Laboratory Hong Kong University of Science and Technology (Guangzhou)

Shaojin Ma

Researchers from a global consortium, including Tianjin University and Huawei Noah’s Ark Lab, developed Embodied Arena, a comprehensive platform for evaluating Embodied AI agents, featuring a systematic capability taxonomy and an automated, LLM-driven data generation pipeline. This platform integrates over 22 benchmarks and 30 models, revealing that specialized embodied models often outperform general models on targeted tasks and identifying object and spatial perception as key performance bottlenecks.

620

08 Dec 2024

computer-science artificial-intelligence machine-learning

A Comprehensive Guide to Explainable AI: From Classical Models to LLMs

Qian NIU

boy yang

A comprehensive guide created by a large inter-institutional collaboration synthesizes the field of Explainable AI (XAI), from classical models to Large Language Models (LLMs). It details diverse XAI techniques and their practical implementation, providing clear definitions, evaluations, and future directions for transparent and trustworthy AI.

471

03 Jun 2024

computer-science continual-learning computation-and-language

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

University of Waterloo

Alibaba Group

University of Manchester

HKUST QMUL

星威曲

晨晨张

Researchers at Alibaba Group and collaborating universities developed the D-CPT Law, a predictive framework that optimizes the data mixture ratio for domain-specific continual pre-training of Large Language Models. This law accurately predicts performance based on model size, dataset size, and data composition, leading to reduced computational costs and improved domain adaptation without exhaustive grid-searching.

115

07 Oct 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

The Safety Challenge of World Models for Embodied AI Agents: A Review

University of Manchester University of Pisa University of Modena and Reggio Emilia Huawei RAMS Lab

This review paper from Huawei RAMS Lab, the University of Pisa, and other institutions systematically analyzes the safety challenges of World Models for embodied AI agents in autonomous driving and robotics. It identifies and quantifies recurring safety-critical failures, termed "pathologies," across state-of-the-art models, revealing consistent shortcomings in areas like physical conformity, temporal consistency, and traffic adherence.

638

02 Dec 2025

computer-science artificial-intelligence computation-and-language

OmniBench: Towards The Future of Universal Omni-Language Models

University of Manchester

Nanjing University

Queen Mary University of London

Dartmouth College 01.ai Hongkong University of Science and Technology

Yizhi Li

星威曲

Recent advancements in multimodal large language models (MLLMs) have focused on integrating multiple modalities, yet their ability to simultaneously process and reason across different inputs remains underexplored. We introduce OmniBench, a novel benchmark designed to evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define language models capable of such tri-modal processing as omni-language models (OLMs). OmniBench features high-quality human annotations that require integrated understanding across all modalities. Our evaluation reveals that: i) open-source OLMs show significant limitations in instruction-following and reasoning in tri-modal contexts; and ii) most baseline models perform poorly (around 50% accuracy) even with textual alternatives to image/audio inputs. To address these limitations, we develop OmniInstruct, an 96K-sample instruction tuning dataset for training OLMs. We advocate for developing more robust tri-modal integration techniques and training strategies to enhance OLM performance. Codes and data could be found at our repo (this https URL).

188

15 Nov 2025

ai-for-health computer-science machine-learning

MIRA: Medical Time Series Foundation Model for Real-World Health Data

Imperial College London

University of Manchester

MIRA, a medical time series foundation model developed by researchers at Microsoft Research and collaborating universities, is designed to forecast patient health trajectories from real-world data characterized by irregular intervals, heterogeneous sampling rates, and frequent missing values. It achieved state-of-the-art performance on seven unseen benchmarks, reducing RMSE by an average of 8% compared to leading baselines, and demonstrated strong robustness to missing data up to 90%.

20 Nov 2025

agentic-frameworks computer-science artificial-intelligence

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

X-Humanoid

Imperial College London

University of Manchester

Fudan University

FeynCraft: A Game of Feynman Diagrams

University of Manchester University of Cyprus

FeynCraft presents an interactive, browser-based game designed for students to learn and practice drawing valid Standard Model Feynman diagrams. The platform integrates real-time rule-based validation and pedagogical visualization overlays, improving comprehension and correct application of particle physics interaction rules.

186

03 Aug 2025

agentic-frameworks agents chain-of-thought

Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

UCLA

Chinese Academy of Sciences

University of Manchester

Fudan University

Shanghai Jiao Tong University

Westlake University Central South University Harbin Institute of Technology Southwestern University of Finance and Economics University of Adelaide HiThink Research

A framework called Web-CogReasoner enables web agents to learn and apply factual, conceptual, and procedural knowledge through a human-inspired curriculum. This structured approach, using the Qwen2.5-VL-7B LMM, leads to an 84.4% accuracy on cognitive reasoning tasks and establishes a new state-of-the-art for open-source agents on the WebVoyager benchmark with a 30.2% success rate.

507

16 Apr 2025

ai-for-health computer-science artificial-intelligence

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

University of Manchester University of Technology Nuremberg University of Sheffield University of Tübingen University of Aberdeen University of Hamburg University of Manitoba TIB – Leibniz Information Centre for Science and Technology IT:U Interdisciplinary Transformation University Austria Austrian Research Institute for Artificial Intelligence

Yizhi Li

Yong Cao

This survey provides a comprehensive overview of how large multimodal language models are transforming scientific discovery, experimentation, content generation, and evaluation. It maps current advancements, limitations, and ethical considerations across five stages of the research cycle, identifying specific AI applications and their impact on scientific workflows.

13 Jul 2025

bayesian-deep-learning computer-science machine-learning

Neural networks leverage nominally quantum and post-quantum representations

University of Manchester Astera Institute Simplex Beyond Institute for Theoretical Science (BITS)

Deep neural networks, when trained for next-token prediction, spontaneously learn to represent beliefs over minimal generative models of stochastic processes, including those optimally described by quantum or post-quantum theories. This universal capability, observed across Transformers, LSTMs, GRUs, and RNNs, allows classical networks to achieve memory compression advantages typically associated with non-classical computational systems.

605

15 Feb 2025

chain-of-thought computer-science computation-and-language

Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models

University of Manchester Technion - IIT, Israel

Researchers at the University of Manchester and Technion – Israel Institute of Technology developed "Logit Flow" to analyze information flow in LLMs, revealing a four-stage single-hop reasoning process and identifying conflicting logit interference as a failure point in multi-hop reasoning. Informed by these insights, they propose "Back Attention," a lightweight architectural modification that improved LLM accuracy on various reasoning datasets and enabled smaller models to achieve performance comparable to larger counterparts.

16 Oct 2025

instrumentation-and-methods-for-astrophysics physics popular-physics

SETI Post-Detection Protocols: Progress Towards a New Version

University of New South Wales

University of Manchester York University University of Leiden Law Offices of Sterns and Tennen

Researchers from the IAA SETI Committee updated the "Declaration of Principles Concerning Activities Following the Detection of Extraterrestrial Intelligence" through a multi-year, interdisciplinary, and consultative process. The revised protocols broaden the scope to all technosignatures, address modern communication challenges, and establish rigorous guidelines for verification, public communication, and international consultation on a potential reply to extraterrestrial intelligence.

731

19 Feb 2025

agentic-frameworks agents computer-science

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

Rensselaer Polytechnic Institute

Harvard University

University of Manchester

NVIDIA

Columbia University

University of Minnesota Stevens Institute of Technology TheFinAI

FLAG-TRADER introduces a framework integrating Large Language Models with gradient-based Reinforcement Learning for financial trading, where the LLM acts as the policy network. The framework enables small-scale, open-source LLMs to achieve superior performance in single-asset trading tasks compared to larger, proprietary models by effectively leveraging RL-based fine-tuning for sequential decision-making.

723

20 Nov 2025

computer-science computation-and-language information-extraction

CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering

Monash University

University of Manchester University of Edinburgh China Mobile Research Institute Southeast University

Recent studies have explored the use of Large Language Models (LLMs) with Retrieval Augmented Generation (RAG) for Knowledge Graph Question Answering (KGQA). They typically require rewriting retrieved subgraphs into natural language formats comprehensible to LLMs. However, when tackling complex questions, the knowledge rewritten by existing methods may include irrelevant information, omit crucial details, or fail to align with the question's semantics. To address them, we propose a novel rewriting method CoTKR, Chain-of-Thought Enhanced Knowledge Rewriting, for generating reasoning traces and corresponding knowledge in an interleaved manner, thereby mitigating the limitations of single-step knowledge rewriting. Additionally, to bridge the preference gap between the knowledge rewriter and the question answering (QA) model, we propose a training strategy PAQAF, Preference Alignment from Question Answering Feedback, for leveraging feedback from the QA model to further optimize the knowledge rewriter. We conduct experiments using various LLMs across several KGQA benchmarks. Experimental results demonstrate that, compared with previous knowledge rewriting methods, CoTKR generates the most beneficial knowledge representation for QA models, which significantly improves the performance of LLMs in KGQA.

19 Sep 2025

computer-science robotics

A Framework for Optimal Ankle Design of Humanoid Robots

University of Manchester Istituto Italiano di Tecnologia Istituto Italiano di Tecnologia (IIT)

The design of the humanoid ankle is critical for safe and efficient ground interaction. Key factors such as mechanical compliance and motor mass distribution have driven the adoption of parallel mechanism architectures. However, selecting the optimal configuration depends on both actuator availability and task requirements. We propose a unified methodology for the design and evaluation of parallel ankle mechanisms. A multi-objective optimization synthesizes the mechanism geometry, the resulting solutions are evaluated using a scalar cost function that aggregates key performance metrics for cross-architecture comparison. We focus on two representative architectures: the Spherical-Prismatic-Universal (SPU) and the Revolute-Spherical-Universal (RSU). For both, we resolve the kinematics, and for the RSU, introduce a parameterization that ensures workspace feasibility and accelerates optimization. We validate our approach by redesigning the ankle of an existing humanoid robot. The optimized RSU consistently outperforms both the original serial design and a conventionally engineered RSU, reducing the cost function by up to 41% and 14%, respectively.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Scaling Latent Reasoning via Looped Language Models

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

A Survey on Latent Reasoning

aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

A Comprehensive Guide to Explainable AI: From Classical Models to LLMs

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

The Safety Challenge of World Models for Embodied AI Agents: A Review

OmniBench: Towards The Future of Universal Omni-Language Models

MIRA: Medical Time Series Foundation Model for Real-World Health Data

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

FeynCraft: A Game of Feynman Diagrams

Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

Neural networks leverage nominally quantum and post-quantum representations

Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models

SETI Post-Detection Protocols: Progress Towards a New Version

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering

A Framework for Optimal Ankle Design of Humanoid Robots

Events

AI for Law

Personalize Your Feed