alphaXiv

History

Papers Benchmarks

University of York

1,159

15 Sep 2025

chain-of-thought computer-science artificial-intelligence

Is In-Context Learning Learning?

Microsoft University of York

This research investigates whether In-Context Learning (ICL) in Large Language Models (LLMs) represents genuine learning, rigorously defining it within a PAC learning framework. The study demonstrates that while ICL improves with more examples (optimal at 50-100 shots), it exhibits considerable brittleness to out-of-distribution shifts and inconsistent generalization across formally similar tasks.

457

607

29 Sep 2025

computer-science artificial-intelligence emerging-technologies

UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces

City University of Hong Kong The Hong Kong University of Science and Technology (Guangzhou)

Southern University of Science and Technology University of York

UniTraj introduces a universal trajectory foundation model, trained on the new billion-scale, globally distributed WorldTrace dataset, to address limitations in task specificity, regional dependency, and data sensitivity for trajectory analysis. It achieves superior zero-shot and fine-tuned performance across recovery, prediction, classification, and generation tasks, for instance, reducing MAE by 32.73% on GeoLife for trajectory recovery compared to TrajBERT.

1,692

13 May 2024

computer-science artificial-intelligence computation-and-language

Introducing v0.5 of the AI Safety Benchmark from MLCommons

California Institute of Technology

Carnegie Mellon University

Google

New York University

University of Chicago

National University of Singapore

University of Oxford

Stanford University

GRB 250702B: Discovery of a Gamma-Ray Burst from a Black Hole Falling into a Star

Northwestern University

NASA Goddard Space Flight Center Louisiana State University

The Pennsylvania State University The George Washington University Los Alamos National Laboratory

Brown University

MIT Liverpool John Moores University University of Alabama in Huntsville Johns Hopkins University Applied Physics Laboratory Istituto Nazionale di Fisica Nucleare Universidad Nacional Autónoma de México Florida Institute of Technology Gran Sasso Science Institute (GSSI)NASA Marshall Space Flight Center University of York Aoyama Gakuin University Max-Planck Institut für extraterrestrische Physik The NSF AI Institute for Artificial Intelligence and Fundamental Interactions INAF – Osservatorio Astronomico di Roma Ioffe Institute The Open University of Israel Università degli Studi di Trento Politecnico di Bari Universities Space Research Association Amentum Space Exploration Division Center for Astrophysics Harvard & Smithsonian

Gamma-ray bursts are the most luminous electromagnetic events in the universe. Their prompt gamma-ray emission has typical durations between a fraction of a second and several minutes. A rare subset of these events have durations in excess of a thousand seconds, referred to as ultra-long gamma-ray bursts. Here, we report the discovery of the longest gamma-ray burst ever seen with a ~25,000 s gamma-ray duration, GRB 250702B, and characterize this event using data from four instruments in the InterPlanetary Network and the Monitor of All-sky X-ray Image. We find a hard spectrum, subsecond variability, and high total energy, which are only known to arise from ultrarelativistic jets powered by a rapidly-spinning stellar-mass central engine. These properties and the extreme duration are together incompatible with all confirmed gamma-ray burst progenitors and nearly all models in the literature. This burst is naturally explained with the helium merger model, where a field binary ends when a black hole falls into a stripped star and proceeds to consume and explode it from within. Under this paradigm, GRB 250702B adds to the growing evidence that helium stars expand and that some ultra-long GRBs have similar evolutionary pathways as collapsars, stellar-mass gravitational wave sources, and potentially rare types of supernovae.

15 Oct 2025

computer-science computers-and-society

International AI Safety Report 2025: First Key Update: Capabilities and Risk Implications

Since the publication of the first International AI Safety Report, AI capabilities have continued to improve across key domains. New training techniques that teach AI systems to reason step-by-step and inference-time enhancements have primarily driven these advances, rather than simply training larger models. As a result, general-purpose AI systems can solve more complex problems in a range of domains, from scientific research to software development. Their performance on benchmarks that measure performance in coding, mathematics, and answering expert-level science questions has continued to improve, though reliability challenges persist, with systems excelling on some tasks while failing completely on others. These capability improvements also have implications for multiple risks, including risks from biological weapons and cyber attacks. Finally, they pose new challenges for monitoring and controllability. This update examines how AI capabilities have improved since the first Report, then focuses on key risk areas where substantial new evidence warrants updated assessments.

20 Oct 2025

audio-and-speech-processing electrical-engineering

AnyRIR: Robust Non-intrusive Room Impulse Response Estimation in the Wild

Aalto University University of York Friedrich-Alexander-Universitat Erlangen-Nurnberg (FAU)

AnyRIR, developed by researchers at Aalto University and the University of York, introduces a method for estimating room impulse responses (RIRs) robustly in uncontrolled, noisy environments by leveraging background music as the excitation signal. The approach, based on "l"_1-norm regression, achieved a -36.0 dB RIR estimation error in simulated non-stationary noise, outperforming conventional methods.

24 Sep 2025

high-energy-astrophysical-phenomena high-energy-physics-phenomenology physics

Dark matter: red or blue?

University of York

Researchers performed the first calculation of light (photon) scattering on heavy dark matter particles, revealing non-zero cross-sections for both weakly interacting and purely gravitational dark matter. The study predicts distinct energy-dependent "coloring" effects and polarization signatures, and established an upper limit on dark matter mass below 5.0 × 10^19 GeV using Galactic Center gamma-ray observations.

1,107

28 Apr 2025

physics quantum-physics

Roadmap on Quantum Thermodynamics

The last two decades has seen quantum thermodynamics become a well established field of research in its own right. In that time, it has demonstrated a remarkably broad applicability, ranging from providing foundational advances in the understanding of how thermodynamic principles apply at the nano-scale and in the presence of quantum coherence, to providing a guiding framework for the development of efficient quantum devices. Exquisite levels of control have allowed state-of-the-art experimental platforms to explore energetics and thermodynamics at the smallest scales which has in turn helped to drive theoretical advances. This Roadmap provides an overview of the recent developments across many of the field's sub-disciplines, assessing the key challenges and future prospects, providing a guide for its near term progress.

246

06 Mar 2025

computer-science artificial-intelligence computation-and-language

Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

Northwestern Polytechnical University

Northeastern University

Sun Yat-Sen University Ghent University Korea University

Nanjing University

Zhejiang University

University of Michigan Xidian University University of Electronic Science and Technology of China Central South University University of Hong Kong Technology Innovation Institute

Yale University Universitat Pompeu Fabra

NVIDIA

Huawei

Nanyang Technological University University of Granada China Telecom Ulsan National Institute of Science and Technology

King’s College London Singapore University of Technology and Design

Aalto University

Virginia Tech University of Houston East China Normal University

KTH Royal Institute of Technology University of Oulu Khalifa University LightOn CentraleSupélec University of Leeds IMEC Nokia Bell Labs CEA-Leti University of York Orange Ericsson Brunel University London Qualcomm China Unicom BubbleRAN ITU EMIRATES INTEGRATED TELECOMMUNICATIONS COMPANY FENTECH GSMA RIMEDO LABS KATIM CHINA MOBILE COMMUNICATIONS CORPORATION Beĳing Institute of Technology Eurécom

Rongpeng Li

A comprehensive white paper from the GenAINet Initiative introduces Large Telecom Models (LTMs) as a novel framework for integrating AI into telecommunications infrastructure, providing a detailed roadmap for innovation while addressing critical challenges in scalability, hardware requirements, and regulatory compliance through insights from a diverse coalition of academic, industry and regulatory experts.

06 Oct 2025

audio-and-speech-processing electrical-engineering

Perceptual Evaluation of Extrapolated Spatial Room Impulse Responses From a Mono Source

University of York Bonza Music LTD

Immersion in virtual and augmented reality solutions is reliant on plausible spatial audio. However, plausibly representing a space for immersive audio often requires many individual acoustic measurements of source-microphone pairs with specialist spatial microphones, making the procedure time-consuming and expensive. In this study, we evaluate the plausibility of extrapolated and spatialised Room Impulse Responses (RIRs) by using a 3-Alternative Forced Choice (3AFC) listening test. The stimuli comprised of RIRs from three spaces convolved with speech, orchestral, and instrumental music. When asked to select which stimuli was artificial out of one extrapolated and two real stimuli, an overall accuracy of 38% was achieved from 20 participants (5 percentage points above the expected guessing rate). Given the listening test result, this study shows that it is possible to extrapolate plausible spatial RIRs from mono measurements, decreasing the need for time and specialist equipment in acoustic measurements.

951

12 May 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

Metrics that matter: Evaluating image quality metrics for medical image generation

University of Manchester

KU Leuven University of Leeds Manchester Metropolitan University University of York Bradford Institute for Health Research

A comprehensive evaluation framework reveals significant limitations in commonly used no-reference image quality metrics (NRIQMs) for medical image generation, demonstrating that upstream metrics often fail to detect clinically relevant issues and correlate poorly with downstream task performance across VAE, GAN, and DDPM architectures.

183

25 Nov 2024

computer-science hardware-architecture

UVLLM: An Automated Universal RTL Verification Framework using LLMs

City University of Hong Kong Southeast University University of York National Center of Technology Innovation for EDA

UVLLM introduces an automated framework for Register Transfer Level (RTL) hardware verification, integrating Large Language Models (LLMs) with the Universal Verification Methodology (UVM). The system achieves an average fix rate of 86.99% for syntax errors and 71.92% for functional errors, outperforming prior methods like MEIC by up to 36.3% and demonstrating a 10.42x speedup.

21 Aug 2025

computer-science computer-vision-security computer-vision-and-pattern-recognition

LBONet: Supervised Spectral Descriptors for Shape Analysis

University of York

The Laplace-Beltrami operator has established itself in the field of non-rigid shape analysis due to its many useful properties such as being invariant under isometric transformation, having a countable eigensystem forming an orthornormal basis, and fully characterizing geodesic distances of the manifold. However, this invariancy only applies under isometric deformations, which leads to a performance breakdown in many real-world applications. In recent years emphasis has been placed upon extracting optimal features using deep learning methods,however spectral signatures play a crucial role and still add value. In this paper we take a step back, revisiting the LBO and proposing a supervised way to learn several operators on a manifold. Depending on the task, by applying these functions, we can train the LBO eigenbasis to be more task-specific. The optimization of the LBO leads to enormous improvements to established descriptors such as the heat kernel signature in various tasks such as retrieval, classification, segmentation, and correspondence, proving the adaption of the LBO eigenbasis to both global and highly local learning settings.

09 Jun 2025

computer-science computation-and-language machine-learning

Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks

University of Washington

University of Oxford

Allen Institute for AI Alan Turing Institute University of York

A study objectively assessed the fairness and robustness of Large Language Models (LLMs) in reasoning tasks when queried in African American Vernacular English (AAVE) versus Standardized English (SE). It found that most LLMs experienced statistically significant performance drops, averaging over 10% relative reduction, on AAVE queries across various reasoning categories, with Chain of Thought and standardization prompting proving insufficient to close this gap.

100

10 May 2024

computer-science hardware-architecture software-engineering

MEIC: Re-thinking RTL Debug Automation using LLMs

University of York National Center of Technology Innovation for EDA, School of Integrated Circuits, Southeast University

The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces a novel framework, Make Each Iteration Count (MEIC), which contrasts with traditional one-shot LLM-based debugging methods that heavily rely on prompt engineering, model tuning, and model training. MEIC utilises LLMs in an iterative process to overcome the limitation of LLMs in RTL code debugging, which is suitable for identifying and correcting both syntax and function errors, while effectively managing the uncertainties inherent in LLM operations. To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors. The experimental results demonstrate that the proposed debugging framework achieves fix rate of 93% for syntax errors and 78% for function errors, with up to 48x speedup in debugging processes when compared with experienced engineers. The Repo. of dataset and code: this https URL.

25 Sep 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Physics-Guided Motion Loss for Video Generation Model

University of Manchester

University of California, Irvine University of York

Researchers propose a physics-guided motion loss that regularizes video diffusion models by enforcing physical plausibility for translation, rotation, and scaling directly in the frequency domain. This approach improves temporal consistency and motion quality in generated videos, achieving substantial gains across various metrics and strong user preference without requiring architectural changes to the generative models.

174

270

09 Apr 2025

computer-science artificial-intelligence computers-and-society

International Scientific Report on the Safety of Advanced AI (Interim Report)

ETH Zurich

University of Washington

University of Illinois at Urbana-Champaign CSIRO

Chinese Academy of Sciences

Carnegie Mellon University

Université de Montréal

University of Oxford

Stanford University

Mila - Quebec AI Institute

University of Southampton

Seoul National University Institute for Advanced Study

Inria

Duke University

Princeton University

HKUST Indian Institute of Technology Madras Alan Turing Institute Oregon State University Concordia AI Pontificia Universidad Católica de Chile University of York University of São Paulo German Research Center for Artificial Intelligence University of Loughborough Mozilla University of Chieti-Pescara ELLIS Alicante The Brookings Institution Federal University of Pernambuco Science Foundation Ireland The Commonwealth Scientific and Industrial Research Organisation Sony Group The National Institute for Research in Digital Science and Technology Israel Innovation Authority Humane Intelligence Federico Santa María Technical University

Stephen Casper

This is the interim publication of the first International Scientific Report on the Safety of Advanced AI. The report synthesises the scientific understanding of general-purpose AI -- AI that can perform a wide variety of tasks -- with a focus on understanding and managing its risks. A diverse group of 75 AI experts contributed to this report, including an international Expert Advisory Panel nominated by 30 countries, the EU, and the UN. Led by the Chair, these independent experts collectively had full discretion over the report's content. The final report is available at arXiv:2501.17805

20 Aug 2025

computer-science hardware-architecture

From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

Sun Yat-Sen University

City University of Hong Kong Southeast University Shenzhen University University of York National Center of Technology Innovation for EDA

Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used in industry to improve verification efficiency through structured and reusable testbenches, constructing these testbenches and generating sufficient stimuli remain challenging. These challenges arise from the considerable manual coding effort required, repetitive manual execution of multiple EDA tools, and the need for in-depth domain expertise to navigate complex this http URL, we present UVM^2, an automated verification framework that leverages Large Language Models (LLMs) to generate UVM testbenches and iteratively refine them using coverage feedback, significantly reducing manual effort while maintaining rigorous verification this http URL evaluate UVM^2, we introduce a benchmark suite comprising Register Transfer Level (RTL) designs of up to 1.6K lines of this http URL results show that UVM^2 reduces testbench setup time by up to UVM^2 compared to experienced engineers, and achieve average code and function coverage of 87.44% and 89.58%, outperforming state-of-the-art solutions by 20.96% and 23.51%, respectively.

08 Aug 2024

computer-science multiagent-systems

Emergence in Multi-Agent Systems: A Safety Perspective

LMU Munich University of York Fraunhofer Institute for Cognitive Systems

Emergent effects can arise in multi-agent systems (MAS) where execution is decentralized and reliant on local information. These effects may range from minor deviations in behavior to catastrophic system failures. To formally define these effects, we identify misalignments between the global inherent specification (the true specification) and its local approximation (such as the configuration of different reward components or observations). Using established safety terminology, we develop a framework to understand these emergent effects. To showcase the resulting implications, we use two broadly configurable exemplary gridworld scenarios, where insufficient specification leads to unintended behavior deviations when derived independently. Recognizing that a global adaptation might not always be feasible, we propose adjusting the underlying parameterizations to mitigate these issues, thereby improving the system's alignment and reducing the risk of emergent failures.

25 Sep 2025

computer-science robotics

Suction Leap-Hand: Suction Cups on a Multi-fingered Hand Enable Embodied Dexterity and In-Hand Teleoperation

Tsinghua University University of Edinburgh University of York

Dexterous in-hand manipulation remains a foundational challenge in robotics, with progress often constrained by the prevailing paradigm of imitating the human hand. This anthropomorphic approach creates two critical barriers: 1) it limits robotic capabilities to tasks humans can already perform, and 2) it makes data collection for learning-based methods exceedingly difficult. Both challenges are caused by traditional force-closure which requires coordinating complex, multi-point contacts based on friction, normal force, and gravity to grasp an object. This makes teleoperated demonstrations unstable and amplifies the sim-to-real gap for reinforcement learning. In this work, we propose a paradigm shift: moving away from replicating human mechanics toward the design of novel robotic embodiments. We introduce the \textbf{S}uction \textbf{Leap}-Hand (SLeap Hand), a multi-fingered hand featuring integrated fingertip suction cups that realize a new form of suction-enabled dexterity. By replacing complex force-closure grasps with stable, single-point adhesion, our design fundamentally simplifies in-hand teleoperation and facilitates the collection of high-quality demonstration data. More importantly, this suction-based embodiment unlocks a new class of dexterous skills that are difficult or even impossible for the human hand, such as one-handed paper cutting and in-hand writing. Our work demonstrates that by moving beyond anthropomorphic constraints, novel embodiments can not only lower the barrier for collecting robust manipulation data but also enable the stable, single-handed completion of tasks that would typically require two human hands. Our webpage is this https URL.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Is In-Context Learning Learning?

UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces

Introducing v0.5 of the AI Safety Benchmark from MLCommons

GRB 250702B: Discovery of a Gamma-Ray Burst from a Black Hole Falling into a Star

International AI Safety Report 2025: First Key Update: Capabilities and Risk Implications

AnyRIR: Robust Non-intrusive Room Impulse Response Estimation in the Wild

Dark matter: red or blue?

Roadmap on Quantum Thermodynamics

Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

Perceptual Evaluation of Extrapolated Spatial Room Impulse Responses From a Mono Source

Metrics that matter: Evaluating image quality metrics for medical image generation

UVLLM: An Automated Universal RTL Verification Framework using LLMs

LBONet: Supervised Spectral Descriptors for Shape Analysis

Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks

MEIC: Re-thinking RTL Debug Automation using LLMs

Physics-Guided Motion Loss for Video Generation Model

International Scientific Report on the Safety of Advanced AI (Interim Report)

From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

Emergence in Multi-Agent Systems: A Safety Perspective

Suction Leap-Hand: Suction Cups on a Multi-fingered Hand Enable Embodied Dexterity and In-Hand Teleoperation

Events

AI for Law

Personalize Your Feed