alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

MSSL

University College London

A Survey of Reinforcement Learning for Large Reasoning Models

09 Oct 2025

University of Washington Shanghai AI Laboratory

This survey paper systematically synthesizes advancements in Reinforcement Learning (RL) for Large Reasoning Models (LRMs), moving beyond human alignment to focus on enhancing intrinsic reasoning capabilities through verifiable rewards. It identifies key components, challenges, and future directions for scaling RL towards Artificial SuperIntelligence (ASI).

#computer-science #artificial-intelligence #computation-and-language

Resources 1,595

Paper thumbnail

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

08 Nov 2025

University of Illinois at Urbana-Champaign

University of California, Santa Barbara

A comprehensive survey formally defines Agentic Reinforcement Learning (RL) for Large Language Models (LLMs) as a Partially Observable Markov Decision Process (POMDP), distinct from conventional LLM-RL, and provides a two-tiered taxonomy of capabilities and task domains. The work consolidates open-source resources and outlines critical open challenges for the field.

#agentic-frameworks #agents #computer-science

Paper thumbnail

Estimating the EVSI with Gaussian Approximations and Spline-Based Series Methods

30 Jan 2024

University of Toronto University College London logo

University College London

Background. The Expected Value of Sample Information (EVSI) measures the expected benefits that could be obtained by collecting additional data. Estimating EVSI using the traditional nested Monte Carlo method is computationally expensive but the recently developed Gaussian approximation (GA) approach can efficiently estimate EVSI across different sample sizes. However, the conventional GA may result in biased EVSI estimates if the decision models are highly nonlinear. This bias may lead to suboptimal study designs when GA is used to optimize the value of different studies. Therefore, we extend the conventional GA approach to improve its performance for nonlinear decision models. Methods. Our method provides accurate EVSI estimates by approximating the conditional benefit based on two steps. First, a Taylor series approximation is applied to estimate the conditional benefit as a function of the conditional moments of the parameters of interest using a spline, which is fitted to the samples of the parameters and the corresponding benefits. Next, the conditional moments of parameters are approximated by the conventional GA and Fisher information. The proposed approach is applied to several data collection exercises involving non-Gaussian parameters and nonlinear decision models. Its performance is compared with the nested Monte Carlo method, the conventional GA approach, and the nonparametric regression-based method for EVSI calculation. Results. The proposed approach provides accurate EVSI estimates across different sample sizes when the parameters of interest are non-Gaussian and the decision models are nonlinear. The computational cost of the proposed method is similar to other novel methods. Conclusions. The proposed approach can estimate EVSI across sample sizes accurately and efficiently, which may support researchers in determining an economically optimal study design using EVSI.

#statistics #methodology

Paper thumbnail

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

25 Aug 2025

Huawei Noah’s Ark Lab Chinese Academy of Sciences logo

Chinese Academy of Sciences

Researchers from UCL AI Centre and Huawei Noah’s Ark Lab developed Memento, a memory-based learning framework enabling LLM agents to continually adapt and improve without fine-tuning their underlying large language models. The framework achieved top performance on complex benchmarks, including 87.88% Pass@3 on GAIA and 95.0% accuracy on SimpleQA, demonstrating efficient, robust adaptation and generalization.

#agentic-frameworks #agents #computer-science

Resources 1,566

Paper thumbnail

GARNN: An Interpretable Graph Attentive Recurrent Neural Network for Predicting Blood Glucose Levels via Multivariate Time Series

26 Feb 2024

Imperial College London University College London logo

University College London

Researchers from University College London and collaborators developed GARNN, an interpretable graph attentive recurrent neural network for predicting blood glucose levels from multivariate time series data. The model consistently achieved state-of-the-art prediction accuracy across four clinical datasets while providing clinically justifiable temporal and global interpretations of variable importance, particularly excelling at attributing sparse event contributions.

#ai-for-health #attention-mechanisms #computer-science

Paper thumbnail

Time-dependent influence metric for cascade dynamics on networks

29 Apr 2025

University College London Aalto University logo

Aalto University

An algorithm for efficiently calculating the expected size of single-seed cascade dynamics on networks is proposed and tested. The expected size is a time-dependent quantity and so enables the identification of nodes who are the most influential early or late in the spreading process. The measure is accurate for both critical and subcritical dynamic regimes and so generalises the nonbacktracking centrality that was previously shown to successfully identify the most influential single spreaders in a model of critical epidemics on networks.

#physics #data-analysis-statistics-and-probability #physics-and-society

Paper thumbnail

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

12 Apr 2021

tim-rocktaschel

Tim Rocktäschel

scott-yih

Scott Yih

New York University University College London logo

University College London

This paper from Facebook AI Research and University College London introduces Retrieval-Augmented Generation (RAG), a general-purpose model that combines pre-trained parametric language generation with a non-parametric differentiable retriever. RAG achieved state-of-the-art results on multiple knowledge-intensive NLP tasks, demonstrating improved factual accuracy and the ability to update its knowledge by simply swapping out its document index.

#computer-science #computation-and-language #machine-learning

Paper thumbnail

Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning

12 Mar 2024

University College London Vectify AI

Mafin, a framework developed by researchers at UCL and Vectify AI, enhances black-box embedding models for domain-specific retrieval by augmenting them with a trainable white-box model. This method consistently improves retrieval accuracy in RAG systems, surpassing both fine-tuned open-source models and the original black-box models in supervised settings, and showing effectiveness even with unsupervised data generation.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Differentiable and accelerated wavelet transforms on the sphere and ball

14 Mar 2024

University College London Alan Turing Institute

This work introduces highly accelerated and differentiable directional wavelet transforms for data on the 2D sphere (S²) and 3D ball (B³), implemented in JAX. It provides S2WAV and S2BALL, open-source software libraries enabling seamless integration of multiscale, anisotropic signal processing with modern machine learning frameworks, while achieving orders of magnitude speedups.

#instrumentation-and-methods-for-astrophysics #computer-science #machine-learning

Paper thumbnail

Exaptation: Academic mentees' career pathway to be independent and impactful

30 Aug 2024

University College London Southeast University

Researchers from SUSTech and UCL developed a data-driven framework to analyze how academic mentees achieve independence and impact, identifying an optimal moderate divergence from mentors' research topics. Their analysis, spanning 60 years and 500,000 mentee records, reveals that a 'follow and innovate' strategy and leveraging mentor's secondary topics are key pathways to surpassing mentor's impact.

#causal-inference #computer-science #digital-libraries

Paper thumbnail

Computational limits to the legibility of the imaged human brain

02 Apr 2024

University College London University of Glasgow

This study comprehensively assessed the extent to which individual biological characteristics can be predicted from multimodal neuroimaging data, utilizing an unprecedented scale of data and computational resources. It found a significant disparity in predictability, with constitutional traits (e.g., sex, age) being highly predictable, while complex psychological and many chronic disease characteristics remained largely unresolved by current neuroimaging paradigms.

#ai-for-health #computer-science #computer-vision-security

Paper thumbnail

Enhancing the Energy Gap of Random Graph Problems via XX-catalysts in Quantum Annealing

24 Sep 2024

University College London

One of the bottlenecks in solving combinatorial optimisation problems using quantum annealers is the emergence of exponentially-closing energy gaps between the ground state and the first excited state during the annealing, which indicates that a first-order phase transition is taking place. The minimum energy gap scales inversely with the exponential of the system size, ultimately resulting in an exponentially large time required to ensure the adiabatic evolution. In this paper we demonstrate that employing multiple XX-catalysts on all the edges of a graph upon which a MWIS (Maximum Weighted Independent Set) problem is defined significantly enhances the minimum energy gap. Remarkably, our analysis shows that the more severe the first-order phase transition, the more effective the catalyst is in opening the gap. This result is based on a detailed statistical analysis performed on a large number of randomly generated MWIS problem instances on both Erdős-Rényi and Barabási-Albert graphs. We also observe that similar performance cannot be achieved by the non-stoquastic version of the same catalyst, with the stoquastic catalyst being the preferred choice in this context.

#physics #quantum-physics

Paper thumbnail

Approximate Top-

k

for Increased Parallelism

05 Dec 2024

University College London Graphcore Research

Researchers from UCL and Graphcore Research introduce a two-stage bucketed approximate top-k algorithm to enhance parallelism on machine learning accelerators, reducing the runtime of the top-k operation by over 4x in LLM sparse attention with minimal impact on accuracy. The method also contributes to an approximately 10% end-to-end speed-up in LLM generation and is provided as a publicly available PyTorch implementation.

#computer-science #machine-learning #efficient-transformers

Paper thumbnail

Compressed representation of brain genetic transcription

20 Jun 2024

University College London

Researchers at University College London and the University of Bordeaux systematically evaluated dimensionality reduction methods for whole-brain genetic transcription, establishing deep auto-encoders as a superior approach to traditional Principal Component Analysis (PCA). The auto-encoder provided greater fidelity in data reconstruction and generated more anatomically plausible latent structures, improving the prediction of diverse neurophysiological targets.

#ai-for-genomics #computer-science #machine-learning

Paper thumbnail

A Global Perspective with Updated Constraints on the Ultra-hot Jupiter WASP-19b: Atmospheric Properties and Stellar Activity

04 Dec 2024

California Institute of Technology University College London logo

University College London

We present a detailed reanalysis of the atmospheric properties of WASP-19b, an ultra-hot Jupiter (1.14 M Jup, 1.41 R Jup) orbiting an active Sun-like star every 0.79 day. We reanalyze a transit and secondary eclipse of WASP-19b observed by the Hubble Space Telescope's Wide Field Camera 3 spectrograph (1.1 - 1.7 microns). When combined with Spitzer photometry at longer wavelengths, our analyses indicate the presence of water absorption features in both the planet's transmission and emission spectra, consistent with results from previously published studies. We jointly fit WASP-19b's dayside emission and transmission spectra with a retrieval model in order to constrain its atmospheric composition, and explore the effect of stellar activity on its transmission spectrum in greater depth. We also compare our dayside emission spectrum to predictions from a general circulation model, and conclude that magnetic drag appears to be relatively unimportant in shaping WASP-19b's atmospheric circulation. Lastly, we compare the size of WASP-19b's dayside water absorption feature to the population of hot Jupiters with similar measurements, and show that it is located in the transitional irradiation regime where temperature inversions first begin to emerge. As in previous studies, we find that the current observations provide relatively weak constraints on this planet's atmospheric properties. These constraints could be significantly improved by the addition of spectroscopically resolved observations at longer wavelengths with JWST/NIRSpec PRISM.

#earth-and-planetary-astrophysics #physics

Paper thumbnail

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

31 Aug 2025

University of Cambridge

National University of Singapore

This survey paper defines and systematically reviews the emerging paradigm of self-evolving AI agents, which bridge static foundation models with dynamic lifelong adaptability. It introduces a unified conceptual framework and a comprehensive taxonomy of evolution techniques, mapping the progression towards continuous self-improvement in AI systems.

#agentic-frameworks #agents #computer-science

Resources 1,025

Paper thumbnail

Visual Planning: Let's Think Only with Images

29 Sep 2025

caiqi-zhang

Caiqi Zhang

University of Cambridge Google logo

A new paradigm, Visual Planning, enables AI models to perform multi-step reasoning solely through sequences of images, eliminating the need for textual mediation. The Visual Planning via Reinforcement Learning (VPRL) framework, applied to vision-first navigation tasks, achieved an 80.6% average Exact Match rate and 84.9% Progress Rate, outperforming text-based reasoning methods by 27%.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Deep Research Agents: A Systematic Examination And Roadmap

03 Sep 2025

Huawei Noah’s Ark Lab University College London logo

University College London

This paper provides a systematic examination and roadmap for Deep Research (DR) agents, defining their core components, classifying architectures, and outlining future challenges. It consolidates disparate efforts in developing AI systems that integrate dynamic reasoning, adaptive planning, and iterative tool use for complex informational research tasks, offering a structured understanding for the field.

#agentic-frameworks #agents #computer-science

Paper thumbnail

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

18 Oct 2025

xu-zhongxing

Zhongxing Xu

South China University of Technology

California Institute of Technology

A comprehensive survey by researchers from Shanghai AI Lab and various global institutions outlines the intricate relationship between scientific large language models (Sci-LLMs) and their data foundations, tracing their evolution towards autonomous agents for scientific discovery. The paper establishes a taxonomy for scientific data and knowledge, meticulously reviews over 270 datasets and 190 benchmarks, and identifies critical data challenges alongside future paradigms.

#agentic-frameworks #agents #computer-science

Paper thumbnail

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

12 May 2025

niels-w

Niels W

University of Toronto UC Berkeley logo

Finetuning Large Language Models on narrow, implicitly malicious tasks, such as generating insecure code without disclosure, can lead to broad, general-purpose misalignment across unrelated domains. This "emergent misalignment" was observed in models like GPT-4o, which subsequently produced misaligned responses 20% of the time on diverse free-form questions, and appears distinct from known vulnerabilities like jailbreaking.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

There are no more papers matching your filters at the moment.