alphaXiv

History

Papers Benchmarks

University of Sheffield

2,025

22 Apr 2024

bayesian-optimization computer-science machine-learning

Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

University of Utah

Lawrence Berkeley National Laboratory University of Sheffield International Computer Science Institute

Wei Xing

KBASS introduces a robust framework for discovering governing equations from data, combining kernel learning with Bayesian spike-and-slab priors and efficient tensor algebra. This approach consistently recovers ground-truth equations from sparse and noisy data, outperforming state-of-the-art methods like SINDy, PINN-SR, and BSL while providing principled uncertainty quantification and improved computational efficiency.

2,934

31 Aug 2025

agentic-frameworks agents computer-science

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

University of Cambridge

National University of Singapore

University College London

Mohamed bin Zayed University of Artificial Intelligence

Leiden University University of Sheffield University of Glasgow University of Aberdeen

This survey paper defines and systematically reviews the emerging paradigm of self-evolving AI agents, which bridge static foundation models with dynamic lifelong adaptability. It introduces a unified conceptual framework and a comprehensive taxonomy of evolution techniques, mapping the progression towards continuous self-improvement in AI systems.

1,025

1,009

06 Dec 2025

agentic-frameworks agents ai-for-cybersecurity

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

A comprehensive synthesis of Large Language Models for automated software development covers the entire model lifecycle, from data curation to autonomous agents, and offers practical guidance derived from empirical experiments on pre-training, fine-tuning, and reinforcement learning, alongside a detailed analysis of challenges and future directions.

953

24 Jun 2024

cosmology-and-nongalactic-astrophysics general-relativity-and-quantum-cosmology physics

Euclid preparation XLVI. The Near-IR Background Dipole Experiment with Euclid

California Institute of Technology University of Oslo

University of Cambridge University of Victoria

Chinese Academy of Sciences University of Zurich

Tel Aviv University

University of Oxford

University of Science and Technology of China Scuola Normale Superiore

University of Copenhagen University of Edinburgh

The University of Texas at Austin

INFN

ETH Zürich Yonsei University University of Crete Kavli Institute for the Physics and Mathematics of the Universe Universität Heidelberg

University of Maryland Universidad Autónoma de Madrid

Université Paris-Saclay

Stockholm University University of Helsinki

University of Arizona University of Western Australia University of Sheffield

Princeton University University of Geneva University of Portsmouth University of Iceland Università di Genova Universidade do Porto University of Sussex INAF Aix Marseille University Niels Bohr Institute University of Jyväskylä University of Padova Jet Propulsion Laboratory Jagiellonian University Instituto de Astrofísica de Canarias University of the Witwatersrand University of Nottingham European Space Agency University of Cape Town SISSA Nicolaus Copernicus Astronomical Center Observatoire de la Côte d’Azur University of Hawai’i University of KwaZulu-Natal Ludwig-Maximilians-Universität Laboratoire d’Astrophysique de Marseille INAF-Istituto di Radioastronomia INAF – Osservatorio Astronomico di Roma Institut de Física d’Altes Energies (IFAE)Laboratoire de Physique des 2 Infinis Irène Joliot-Curie Osservatorio Astronomico della Regione Autonoma Valle d’Aosta INAF - Osservatorio Astrofisico di Catania INAF - Osservatorio Astronomico di Arcetri Institut d’Astrophysique Spatiale NASA DTU Space The Queen’s University of Belfast Instituto de Astrofísica e Ciências do Espaço, Universidade de Lisboa IRAP, Université de Toulouse, CNRS, CNES ETH, Institute for Astronomy INAF-IASF, Bologna Cosmic Dawn Center(DAWN)Universit degli Studi di Ferrara Universit de Paris Universit Claude Bernard Lyon 1 Excellence Cluster ‘Origins’Universit de Lyon Universit di Pisa IFCA-CSIC-UC INAF Osservatorio Astronomico di Padova Universit degli Studi di Firenze Universit de Montpellier Universit degli Studi di Napoli Federico II Universit di Roma Tor VergataINAF Osservatorio di Astrofisica e Scienza dello Spazio di Bologna Universit Di Bologna INAF ` Osservatorio Astronomico di Trieste Universit degli Studi di Trieste

Arthur Loureiro

Verifying the fully kinematic nature of the cosmic microwave background (CMB) dipole is of fundamental importance in cosmology. In the standard cosmological model with the Friedman-Lemaitre-Robertson-Walker (FLRW) metric from the inflationary expansion the CMB dipole should be entirely kinematic. Any non-kinematic CMB dipole component would thus reflect the preinflationary structure of spacetime probing the extent of the FLRW applicability. Cosmic backgrounds from galaxies after the matter-radiation decoupling, should have kinematic dipole component identical in velocity with the CMB kinematic dipole. Comparing the two can lead to isolating the CMB non-kinematic dipole. It was recently proposed that such measurement can be done using the near-IR cosmic infrared background (CIB) measured with the currently operating Euclid telescope, and later with Roman. The proposed method reconstructs the resolved CIB, the Integrated Galaxy Light (IGL), from Euclid's Wide Survey and probes its dipole, with a kinematic component amplified over that of the CMB by the Compton-Getting effect. The amplification coupled with the extensive galaxy samples forming the IGL would determine the CIB dipole with an overwhelming signal/noise, isolating its direction to sub-degree accuracy. We develop details of the method for Euclid's Wide Survey in 4 bands spanning 0.6 to 2 mic. We isolate the systematic and other uncertainties and present methodologies to minimize them, after confining the sample to the magnitude range with negligible IGL/CIB dipole from galaxy clustering. These include the required star-galaxy separation, accounting for the extinction correction dipole using the method newly developed here achieving total separation, accounting for the Earth's orbital motion and other systematic effects. (Abridged)

1,508

12 Jun 2023

computer-science artificial-intelligence computation-and-language

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

ETH Zurich

KAIST

University of Washington Rensselaer Polytechnic Institute

Google DeepMind

University of Amsterdam

University of Illinois at Urbana-Champaign

University of Cambridge Heidelberg University

University of Waterloo Facebook

Carnegie Mellon University

University of Southern California

Google

New York University University of Stuttgart

UC Berkeley

National University of Singapore

University College London

University of Oxford LMU Munich

Shanghai Jiao Tong University

University of California, Irvine

Tsinghua University

Stanford University

University of Michigan

University of Copenhagen

The Chinese University of Hong Kong University of Melbourne

Meta University of Edinburgh

OpenAI

The University of Texas at Austin

Cornell University

University of California, San Diego Yonsei University

McGill University

Boston University University of Bamberg

Nanyang Technological University

Microsoft

KU Leuven

Columbia University UC Santa Barbara

Allen Institute for AI German Research Center for Artificial Intelligence (DFKI)

University of Pennsylvania

Johns Hopkins University

Arizona State University

University of Maryland

University of Tokyo University of North Carolina at Chapel Hill Hebrew University of Jerusalem Amazon Tilburg University University of Massachusetts Amherst University of Rochester University of Duisburg-Essen Sapienza University of Rome University of Sheffield

Princeton University

HKUST University of Tübingen TU Berlin Saarland University Technical University of Darmstadt University of Haifa University of Trento University of Montreal Bilkent University University of Cape Town Bar Ilan University IBM University of Mannheim

ServiceNow Potsdam University Polish-Japanese Academy of Information Technology Salesforce ASAPP AI21 Labs Valencia Polytechnic University University of Trento, Italy

Allen Nie

Jos Rozen

+13

A large-scale and diverse benchmark, BIG-bench, was introduced to rigorously evaluate the capabilities and limitations of large language models across 204 tasks. The evaluation revealed that even state-of-the-art models currently achieve aggregate scores below 20 (on a 0-100 normalized scale), indicating significantly lower performance compared to human experts.

853

05 Nov 2025

computer-science artificial-intelligence computation-and-language

Dense SAE Latents Are Features, Not Bugs

ETH Zürich

MIT University of Sheffield

This research redefines dense latents in Sparse Autoencoders (SAEs) from perceived training artifacts to functional features, demonstrating they reflect intrinsic, frequently activating computations within large language models. The study reveals these latents perform diverse, interpretable roles, including tracking token position, binding contextual information, and regulating output entropy, and persist across different model architectures.

2,513

19 Mar 2025

ai-for-health computer-science artificial-intelligence

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Imperial College London

University of Oxford Harvard Medical School Massachusetts General Hospital

Technical University of Munich University of Sheffield

This work introduces MedVLM-R1, a medical Vision-Language Model (VLM) that leverages Group Relative Policy Optimization (GRPO) to incentivize explicit natural language reasoning in radiology tasks. The model achieves an average accuracy of 78.22% across MRI, CT, and X-ray modalities, outperforming larger models and demonstrating robust generalization to out-of-distribution data while generating interpretable reasoning steps.

109

27 Nov 2025

ai-for-cybersecurity computer-science computer-vision-and-pattern-recognition

Rethinking Cross-Generator Image Forgery Detection through DINOv3

Nanyang Technological University University of Liverpool University of Sheffield

HKUST UC Merced

Researchers investigate cross-generator image forgery detection, demonstrating that a frozen, vision-only DINOv3 foundation model achieves strong generalization by leveraging global, low-frequency structural inconsistencies between real and fake images. The proposed Fisher-Guided Token Selection (FGTS) framework, built on DINOv3, establishes new state-of-the-art accuracy across multiple benchmarks with minimal supervision.

309

08 Nov 2024

computer-science artificial-intelligence computation-and-language

Confidence Regulation Neurons in Language Models

ETH Zürich

MIT University of Sheffield Technion

Researchers identified two types of neurons in large language models, "entropy neurons" and "token frequency neurons," which regulate next-token prediction uncertainty. Entropy neurons modulate output entropy by leveraging LayerNorm and the unembedding matrix's null space, while token frequency neurons adjust the output distribution relative to empirical token frequencies.

08 Oct 2025

adversarial-attacks ai-for-cybersecurity computer-science

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

University of Waterloo University of Sheffield

Language models (LMs) may memorize personally identifiable information (PII) from training data, enabling adversaries to extract it during inference. Existing defense mechanisms such as differential privacy (DP) reduce this leakage, but incur large drops in utility. Based on a comprehensive study using circuit discovery to identify the computational circuits responsible PII leakage in LMs, we hypothesize that specific PII leakage circuits in LMs should be responsible for this behavior. Therefore, we propose PATCH (Privacy-Aware Targeted Circuit PatcHing), a novel approach that first identifies and subsequently directly edits PII circuits to reduce leakage. PATCH achieves better privacy-utility trade-off than existing defenses, e.g., reducing recall of PII leakage from LMs by up to 65%. Finally, PATCH can be combined with DP to reduce recall of residual leakage of an LM to as low as 0.01%. Our analysis shows that PII leakage circuits persist even after the application of existing defense mechanisms. In contrast, PATCH can effectively mitigate their impact.

134

07 Jun 2018

physics quantum-physics

An Efficient Quantum Compiler that reduces $T$ count

University of Sheffield

Before executing a quantum algorithm, one must first decompose the algorithm into machine-level instructions compatible with the architecture of the quantum computer, a process known as quantum compiling. There are many different quantum circuit decompositions for the same algorithm but it is desirable to compile leaner circuits. A fundamentally important cost metric is the

T

count -- the number of

T

gates in a circuit. For the single qubit case, optimal compiling is essentially a solved problem. However, multi-qubit compiling is a harder problem with optimal algorithms requiring classical runtime exponential in the number of qubits. Here, we present and compare several efficient quantum compilers for multi-qubit Clifford +

T

circuits. We implemented our compilers in C++ and benchmarked them on random circuits, from which we determine that our TODD compiler yields the lowest

T

counts on average. We also benchmarked TODD on a library of reversible logic circuits that appear in quantum algorithms and found that it reduced the

T

count for 97\% of the circuits with an average

T

-count saving of 20\% when compared against the best of all previous circuit decompositions.

444

27 Dec 2024

computer-science artificial-intelligence computation-and-language

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

University of Waterloo

Carnegie Mellon University

New York University Beijing Academy of Artificial Intelligence University of Sheffield

HKUST

Queen Mary University of London

Dartmouth College University of Michigan - Ann Arbor

Yizhi Li

MERT introduces a general-purpose, computationally affordable, self-supervised acoustic music understanding model that employs a novel multi-task framework with both acoustic and music-specific teachers. It achieves state-of-the-art performance across 14 diverse Music Information Retrieval tasks while being significantly more efficient than prior large generative models.

413

13 Oct 2025

agents computer-science artificial-intelligence

Conjecturing: An Overlooked Step in Formal Mathematical Reasoning

University of Sheffield Huawei Noah ’s Ark Lab

Autoformalisation, the task of expressing informal mathematical statements in formal language, is often viewed as a direct translation process. This, however, disregards a critical preceding step: conjecturing. Many mathematical problems cannot be formalised directly without first conjecturing a conclusion such as an explicit answer, or a specific bound. Since Large Language Models (LLMs) already struggle with autoformalisation, and the evaluation of their conjecturing ability is limited and often entangled within autoformalisation or proof, it is particularly challenging to understand its effect. To address this gap, we augment existing datasets to create ConjectureBench, and redesign the evaluation framework and metric specifically to measure the conjecturing capabilities of LLMs both as a distinct task and within the autoformalisation pipeline. Our evaluation of foundational models, including GPT-4.1 and DeepSeek-V3.1, reveals that their autoformalisation performance is substantially overestimated when the conjecture is accounted for during evaluation. However, the conjecture should not be assumed to be provided. We design an inference-time method, Lean-FIRe to improve conjecturing and autoformalisation, which, to the best of our knowledge, achieves the first successful end-to-end autoformalisation of 13 PutnamBench problems with GPT-4.1 and 7 with DeepSeek-V3.1. We demonstrate that while LLMs possess the requisite knowledge to generate accurate conjectures, improving autoformalisation performance requires treating conjecturing as an independent task, and investigating further how to correctly integrate it within autoformalisation. Finally, we provide forward-looking guidance to steer future research toward improving conjecturing, an overlooked step of formal mathematical reasoning.

507

16 Apr 2025

ai-for-health computer-science artificial-intelligence

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

University of Manchester University of Technology Nuremberg University of Sheffield University of Tübingen University of Aberdeen University of Hamburg University of Manitoba TIB – Leibniz Information Centre for Science and Technology IT:U Interdisciplinary Transformation University Austria Austrian Research Institute for Artificial Intelligence

Yizhi Li

Yong Cao

This survey provides a comprehensive overview of how large multimodal language models are transforming scientific discovery, experimentation, content generation, and evaluation. It maps current advancements, limitations, and ethical considerations across five stages of the research cycle, identifying specific AI applications and their impact on scientific workflows.

385

31 Jul 2024

attention-mechanisms computer-science conversational-ai

Adaptive Retrieval-Augmented Generation for Conversational Systems

University College London University of Liverpool University of Sheffield University of Aberdeen

Researchers introduce RAGate, an adaptive mechanism that dynamically determines when to augment conversational system responses with external knowledge, addressing the limitations of constant retrieval-augmented generation. This approach maintains response quality while significantly reducing the generation confidence drop, which was 10.43% for constant augmentation but only 0.36% for RAGate-MHA.

113

24 Sep 2024

physics quantum-physics

A real-time, scalable, fast and highly resource efficient decoder for a quantum computer

University of Sheffield Riverlane

A Collision Clustering (CC) decoder is introduced, optimized for hardware implementation to enable real-time, scalable, fast, and resource-efficient quantum error correction. The ASIC implementation decodes a 1057-qubit surface code in 240 ns while consuming 7.85 mW and occupying 0.06 mm², achieving a 0.78% threshold with a circuit-level noise model.

13 Oct 2025

attention-mechanisms computer-science computation-and-language

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling

University of Sheffield

Researchers at the University of Sheffield systematically analyze the core design principles of Transformer attention, identifying which components are essential for effective language modeling. Their work demonstrates that while token mixing is crucial, principles like the mathematical form and QK derivation can be significantly simplified, particularly when combined with standard attention in hybrid architectures, achieving comparable or improved predictive performance.

278

08 Nov 2022

computer-science computation-and-language domain-adaptation

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

University of Copenhagen University of Sheffield Universität Hamburg Athens University of Economics and Business Illinois Tech - Chicago Kent College of Law Bucerius Law School CodeX, Stanford Law School

Ion Androutsopoulos

Laws and their interpretations, legal arguments and agreements\ are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size. Natural language understanding (NLU) technologies can be a valuable tool to support legal practitioners in these endeavors. Their usefulness, however, largely depends on whether current state-of-the-art models can generalize across various tasks in the legal domain. To answer this currently open question, we introduce the Legal General Language Understanding Evaluation (LexGLUE) benchmark, a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way. We also provide an evaluation and analysis of several generic and legal-oriented models demonstrating that the latter consistently offer performance improvements across multiple tasks.

195

08 Oct 2025

agents chain-of-thought computer-science

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition

University of Manchester Idiap Research Institute University of Sheffield École Polytechnique Fédérale de Lausanne (EPFL)CRUK Manchester Institute

Neuro-symbolic NLP methods aim to leverage the complementary strengths of large language models and formal logical solvers. However, current approaches are mostly static in nature, i.e., the integration of a target solver is predetermined at design time, hindering the ability to employ diverse formal inference strategies. To address this, we introduce an adaptive, multi-paradigm, neuro-symbolic inference framework that: (1) automatically identifies formal reasoning strategies from problems expressed in natural language; and (2) dynamically selects and applies specialized formal logical solvers via autoformalization interfaces. Extensive experiments on individual and multi-paradigm reasoning tasks support the following conclusions: LLMs are effective at predicting the necessary formal reasoning strategies with an accuracy above 90 percent. This enables flexible integration with formal logical solvers, resulting in our framework outperforming competing baselines by 27 percent and 6 percent compared to GPT-4o and DeepSeek-V3.1, respectively. Moreover, adaptive reasoning can even positively impact pure LLM methods, yielding gains of 10, 5, and 6 percent on zero-shot, CoT, and symbolic CoT settings with GPT-4o. Finally, although smaller models struggle with adaptive neuro-symbolic reasoning, post-training offers a viable path to improvement. Overall, this work establishes the foundations for adaptive LLM-symbolic reasoning, offering a path forward for unifying material and formal inferences on heterogeneous reasoning challenges.

04 Feb 2023

physics quantum-physics

Parallel window decoding enables scalable fault tolerant quantum computation

University College London University of Sheffield Riverlane

Large-scale quantum computers have the potential to hold computational capabilities beyond conventional computers for certain problems. However, the physical qubits within a quantum computer are prone to noise and decoherence, which must be corrected in order to perform reliable, fault-tolerant quantum computations. Quantum Error Correction (QEC) provides the path for realizing such computations. QEC continuously generates a continuous stream of data that decoders must process at the rate it is received, which can be as fast as 1 MHz in superconducting quantum computers. A little known fact of QEC is that if the decoder infrastructure cannot keep up, a data backlog problem is encountered and the quantum computer runs exponentially slower. Today's leading approaches to quantum error correction are not scalable as existing decoders typically run slower as the problem size is increased, inevitably hitting the backlog problem. That is: the current leading proposal for fault-tolerant quantum computation is not scalable. Here, we show how to parallelize decoding to achieve almost arbitrary speed, removing this roadblock to scalability. Our parallelization requires some classical feed forward decisions to be delayed, leading to a slow-down of the logical clock speed. However, the slow-down is now only polynomial in code size, averting the exponential slowdown. We numerically demonstrate our parallel decoder for the surface code, showing no noticeable reduction in logical fidelity compared to previous decoders and demonstrating the parallelization speedup.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

Euclid preparation XLVI. The Near-IR Background Dipole Experiment with Euclid

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Dense SAE Latents Are Features, Not Bugs

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Rethinking Cross-Generator Image Forgery Detection through DINOv3

Confidence Regulation Neurons in Language Models

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

An Efficient Quantum Compiler that reduces $T$ count

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Conjecturing: An Overlooked Step in Formal Mathematical Reasoning

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

Adaptive Retrieval-Augmented Generation for Conversational Systems

A real-time, scalable, fast and highly resource efficient decoder for a quantum computer

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition

Parallel window decoding enables scalable fault tolerant quantum computation

Events

AI for Law

Personalize Your Feed

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

Euclid preparation XLVI. The Near-IR Background Dipole Experiment with Euclid

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Dense SAE Latents Are Features, Not Bugs

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Rethinking Cross-Generator Image Forgery Detection through DINOv3

Confidence Regulation Neurons in Language Models

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

An Efficient Quantum Compiler that reduces TTT count

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Conjecturing: An Overlooked Step in Formal Mathematical Reasoning

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

Adaptive Retrieval-Augmented Generation for Conversational Systems

A real-time, scalable, fast and highly resource efficient decoder for a quantum computer

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition

Parallel window decoding enables scalable fault tolerant quantum computation

Events

AI for Law

Personalize Your Feed

An Efficient Quantum Compiler that reduces $T$ count