alphaXiv

History

Papers Benchmarks

University of Bristol

6,198

08 Nov 2025

agentic-frameworks agents computer-science

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

University of Illinois at Urbana-Champaign

University of California, Santa Barbara

Chinese Academy of Sciences

Imperial College London Shanghai AI Laboratory

National University of Singapore

University College London

University of Oxford

Fudan University

University of Science and Technology of China

University of Bristol

The Chinese University of Hong Kong

University of California, San Diego Dalian University of Technology

University of Georgia

Brown University

A comprehensive survey formally defines Agentic Reinforcement Learning (RL) for Large Language Models (LLMs) as a Partially Observable Markov Decision Process (POMDP), distinct from conventional LLM-RL, and provides a two-tiered taxonomy of capabilities and task domains. The work consolidates open-source resources and outlines critical open challenges for the field.

2,006

25 Jul 2025

computer-science computers-and-society

Verifying International Agreements on AI: Six Layers of Verification for Rules on Large-Scale AI Development and Deployment

University of Bristol RAND AI Verification and Evaluation Research Institute

Mauricio Baker

Researchers from RAND and partners propose a structured, six-layer framework for verifying international agreements on large-scale AI development and deployment. The framework decomposes verification into subgoals and identifies specific R&D challenges for building robust, confidential systems to foster trust and mitigate global risks.

594

19 Nov 2025

statistical-mechanics strongly-correlated-electrons physics

Characterizing entanglement at finite temperature: how does a "classical" paramagnet become a quantum spin liquid?

University of Bristol Okinawa Institute of Science and Technology Graduate University

Researchers from OIST and the University of Bristol developed a quantum information-based method using semidefinite programming to quantify entanglement depth and spatial structure in quantum spin liquids at finite temperatures. Applied to the Kagome and Kitaev models, this approach identifies distinct temperature scales for the onset of bipartite and genuine multipartite entanglement, correlating these with specific thermodynamic features.

2,827

10 Oct 2024

computer-science computation-and-language machine-learning

Steering Language Models With Activation Engineering

University of Bristol Arb Research

This research introduces Activation Addition (ActAdd), a technique for steering Large Language Models by directly manipulating their internal activation states at inference time. ActAdd achieves state-of-the-art results in controlling output properties like topic, toxicity, and sentiment across various models (e.g., OPT, LLaMA-3) while maintaining the model's general knowledge and fluency.

10,178

16 May 2025

agentic-frameworks agents computer-science

A Self-Improving Coding Agent

University of Bristol iGent AI

A self-improving coding agent, SICA, autonomously edits its own Python codebase to enhance performance. This system achieved an improvement from 17% to 53% accuracy on SWE-Bench Verified tasks by developing new tools and refining its operational logic through a non-gradient-based learning mechanism.

1,005

11 Mar 2022

computer-science computer-vision-security artificial-intelligence

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego4D introduces a large-scale collection of 3,670 hours of egocentric video, captured globally from 931 unique wearers, complemented by modalities like audio, 3D environment meshes, and eye gaze. This dataset and its five associated benchmarks aim to advance research in first-person visual perception for embodied AI, enabling tasks such as episodic memory, hand-object manipulation, and activity forecasting.

796

25 Sep 2024

computer-science computer-vision-security artificial-intelligence

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Rawal Khirodkar

Zhengyi Luo

Ego-Exo4D introduces the largest public dataset of time-synchronized, multimodal, multiview ego-exocentric video, capturing 740 participants performing skilled activities across 8 diverse domains in 123 natural environments. The dataset, a collaboration of 15 institutions, includes Project Aria data and extensive language annotations, supporting four benchmark families for understanding human skill.

620

29 Jul 2025

adversarial-attacks adversarial-robustness computer-science

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Anthropic

New York University

University College London

Georgia Institute of Technology

University of Bristol

University of Maryland

MIT Astra

Stephen Casper

Researchers at institutions including MIT CSAIL and Anthropic introduce targeted latent adversarial training (LAT) to bolster large language model robustness against persistent harmful behaviors. This technique effectively enhances jailbreak defenses, removes backdoors, and improves machine unlearning, often achieving superior results with orders of magnitude less computation.

166

21 Oct 2025

agents computer-science artificial-intelligence

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Carnegie Mellon University

University College London

Shanghai Jiao Tong University

University of Bristol

This survey from Shanghai Jiao Tong University and collaborators systematically categorizes Process Reward Models (PRMs) for Large Language Models, detailing methods for their data generation, construction, and deployment. The work highlights how PRMs provide fine-grained, step-level feedback, which significantly improves LLM alignment and reasoning quality in complex, multi-step tasks compared to outcome-only reward models.

140

09 Oct 2025

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding

University of Bristol Memories.ai Research

The MARC framework from the University of Bristol and Memories.ai Research efficiently processes video for Visual Language Models, achieving a 95% reduction in visual tokens while preserving near-identical accuracy compared to uncompressed baselines. This framework utilizes a visual memory retriever and RL-based distillation, resulting in a 72.4% reduction in GPU memory and a 23.9% decrease in LLM generation latency.

404

25 Mar 2025

computer-science computer-vision-security computer-vision-and-pattern-recognition

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

University of Bristol Uni. of Bath Singapore Management Uni.Leiden Uni.

The HD-EPIC dataset provides 41 hours of multi-day egocentric video from home kitchens, capturing unscripted activities with dense, multi-modal annotations including 3D scene digital twins, nutritional tracking, and explicit 'how' and 'why' action clauses. Benchmarking reveals state-of-the-art video-language models achieve only 37.6% accuracy on its Visual Question Answering tasks, significantly below human performance of 90.3%, highlighting current AI limitations in complex egocentric understanding.

113

15 Oct 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Zhejiang University

University of Bristol China Academy of Space Technology

Referring expression understanding in remote sensing poses unique challenges, as it requires reasoning over complex object-context relationships. While supervised fine-tuning (SFT) on multimodal large language models achieves strong performance with massive labeled datasets, they struggle in data-scarce scenarios, leading to poor generalization. To address this limitation, we propose Geo-R1, a reasoning-centric reinforcement fine-tuning (RFT) paradigm for few-shot geospatial referring. Geo-R1 enforces the model to first generate explicit, interpretable reasoning chains that decompose referring expressions, and then leverage these rationales to localize target objects. This "reason first, then act" process enables the model to make more effective use of limited annotations, enhances generalization, and provides interpretability. We validate Geo-R1 on three carefully designed few-shot geospatial referring benchmarks, where our model consistently and substantially outperforms SFT baselines. It also demonstrates strong cross-dataset generalization, highlighting its robustness. Code and data will be released at: this https URL.

915

28 May 2025

computer-science artificial-intelligence computation-and-language

Natural Language Reinforcement Learning

National University of Singapore

University College London

Shanghai Jiao Tong University

University of Bristol University of Surrey

Brown University

Ziyu Wan

The paper introduces Natural Language Reinforcement Learning (NLRL), a framework that redefines core reinforcement learning components like value functions and policies in natural language using large language models. This approach enables agents to gain a deeper, interpretable understanding of their experiences, demonstrating superior performance and more stable learning on multi-step tasks like maze navigation and board games compared to traditional and LLM-based baselines.

395

1,170

05 Feb 2024

bayesian-deep-learning computer-science machine-learning

Bayesian Low-rank Adaptation for Large Language Models

University of Bristol University of Massachusetts Amherst

Researchers from the University of Bristol and UMass Amherst developed Laplace-LoRA, a method that applies post-hoc Bayesian inference to LoRA parameters to improve the calibration and uncertainty quantification of fine-tuned large language models. This approach significantly lowers calibration errors and negative log-likelihood across various tasks and models, while maintaining predictive accuracy and computational efficiency.

2,131

09 Jul 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Scaling 4D Representations

Google DeepMind

Université de Montréal

University of Oxford

University of Bristol

Google Research

Dilara Gokay

goker erdogan

Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks

\unicode{x2013}

action classification, ImageNet classification, etc. In this paper we focus on evaluating self-supervised learning on non-semantic vision tasks that are more spatial (3D) and temporal (+1D = 4D), such as camera pose estimation, point and object tracking, and depth estimation. We show that by learning from very large video datasets, masked auto-encoding (MAE) with transformer video models actually scales, consistently improving performance on these 4D tasks, as model size increases from 20M all the way to the largest by far reported self-supervised video model

\unicode{x2013}

22B parameters. Rigorous apples-to-apples comparison with many recent image and video models demonstrates the benefits of scaling 4D representations. Pretrained models are available at this https URL .

30 Sep 2025

agentic-frameworks agents computer-science

Memory-Driven Self-Improvement for Decision Making with Large Language Models

Chinese Academy of Sciences

Imperial College London

University College London

University of Bristol Institute of Automation, Chinese Academy of Science

A framework integrates Large Language Model general knowledge with domain-specific experiences for sequential decision-making by combining memory-driven value estimation and LLM prior refinement. This approach, developed by researchers from CAS, UCAS, Imperial College London, UCL, and University of Bristol, demonstrates over 40% performance improvement on complex ALFWorld environments and a 75% gain on unseen tasks compared to pretrained LLMs.

174

25 Oct 2025

causal-inference chain-of-thought computer-science

Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning

Tianjin University

University College London

University of Bristol

City University of Hong Kong The Hong Kong University of Science and Technology (Guangzhou)

Peking University

Chain-of-Thought (CoT) prompting plays an indispensable role in endowing large language models (LLMs) with complex reasoning capabilities. However, CoT currently faces two fundamental challenges: (1) Sufficiency, which ensures that the generated intermediate inference steps comprehensively cover and substantiate the final conclusion; and (2) Necessity, which identifies the inference steps that are truly indispensable for the soundness of the resulting answer. We propose a causal framework that characterizes CoT reasoning through the dual lenses of sufficiency and necessity. Incorporating causal Probability of Sufficiency and Necessity allows us not only to determine which steps are logically sufficient or necessary to the prediction outcome, but also to quantify their actual influence on the final reasoning outcome under different intervention scenarios, thereby enabling the automated addition of missing steps and the pruning of redundant ones. Extensive experimental results on various mathematical and commonsense reasoning benchmarks confirm substantial improvements in reasoning efficiency and reduced token usage without sacrificing accuracy. Our work provides a promising direction for improving LLM reasoning performance and cost-effectiveness.

275

31 May 2025

computer-science machine-learning generative-models

Score Matching With Missing Data

Nanjing University

University of Bristol

Score matching is a vital tool for learning the distribution of data with applications across many areas including diffusion processes, energy based modelling, and graphical model estimation. Despite all these applications, little work explores its use when data is incomplete. We address this by adapting score matching (and its major extensions) to work with missing data in a flexible setting where data can be partially missing over any subset of the coordinates. We provide two separate score matching variations for general use, an importance weighting (IW) approach, and a variational approach. We provide finite sample bounds for our IW approach in finite domain settings and show it to have especially strong performance in small sample lower dimensional cases. Complementing this, we show our variational approach to be strongest in more complex high-dimensional settings which we demonstrate on graphical model estimation tasks on both real and simulated data.

122

23 Apr 2024

computer-science robotics

Rank2Reward: Learning Shaped Reward Functions from Passive Video

University of Washington

University of Bristol

Rank2Reward demonstrates learning effective reward functions for robotic manipulation directly from passive video demonstrations. It leverages the temporal ordering of video frames to infer task progress, enabling robots to learn complex behaviors without manual reward engineering.

145

22 Aug 2025

attention-mechanisms computer-science computation-and-language

CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention

University of Bristol

Rutgers University Brown Univeristy

Researchers identify and address internal attention deficits within Large Vision-Language Models during multimodal in-context learning. They introduce CAMA, a training-free, plug-and-play method that dynamically modulates attention logits during inference, leading to an average accuracy increase of 2.96% across various VQA benchmarks and strong generalization to other tasks.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Verifying International Agreements on AI: Six Layers of Verification for Rules on Large-Scale AI Development and Deployment

Characterizing entanglement at finite temperature: how does a "classical" paramagnet become a quantum spin liquid?

Steering Language Models With Activation Engineering

A Self-Improving Coding Agent

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Natural Language Reinforcement Learning

Bayesian Low-rank Adaptation for Large Language Models

Scaling 4D Representations

Memory-Driven Self-Improvement for Decision Making with Large Language Models

Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning

Score Matching With Missing Data

Rank2Reward: Learning Shaped Reward Functions from Passive Video

CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention

Events

AI for Law

Personalize Your Feed