alphaXiv

History

Papers Benchmarks

Queen Mary University of London

2,137

19 Jul 2024

computer-science computation-and-language computers-and-society

MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions

ISI Foundation

Queen Mary University of London

MoralBERT introduces a suite of fine-tuned BERT models that accurately identify moral values in social media discussions, leveraging the Moral Foundations Theory across diverse datasets. The models, particularly MoralBERT_adv with domain-adversarial training, significantly outperform lexicon-based methods, traditional machine learning, and zero-shot GPT-4 for in-domain predictions, achieving up to 32% higher F1 scores.

2,033

31 Jul 2024

computer-science computer-science-and-game-theory social-and-information-networks

Evolutionary game selection creates cooperative environments

Queen Mary University of London Complexity Science Hub Vienna University of Zaragoza Central European University Vienna Universit di Catania

This research introduces a co-evolutionary framework where individual strategies and the game environments they embody undergo simultaneous evolutionary selection. It demonstrates how such dynamic environments, particularly when structured by complex networks, foster the emergence and maintenance of cooperative behavior from various initial conditions.

2,124

02 Dec 2024

ai-for-health computer-science computation-and-language

Exploring Long-Term Prediction of Type 2 Diabetes Microvascular Complications

Allen Institute for AI

Queen Mary University of London

Researchers from Queen Mary University of London and the Allen Institute for AI developed a code-agnostic approach using clinical language models to predict long-term Type 2 Diabetes microvascular complications from electronic health records. Their text-based models generally outperformed code-based methods, achieving a Micro-AUPRC of 0.51 for 5-year predictions, with performance further improving to 0.66 when recent clinical history was prioritized.

4,779

06 May 2025

agents chain-of-thought computer-science

RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation

Queen Mary University of London Beijing University of Post and Communications

Retrieval-Augmented Generation with Model Context Protocol (RAG-MCP) enables large language models to efficiently select from large sets of external tools by retrieving only relevant tool descriptions, reducing prompt token usage by 73% while maintaining tool selection accuracy across increasing tool pool sizes.

565

19 May 2025

chain-of-thought computer-science computation-and-language

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Chinese Academy of Sciences

Shanghai Jiao Tong University

Tsinghua University

ByteDance

The University of Texas at Austin

Nanyang Technological University

HKUST

Queen Mary University of London Shanghai Innovation Institute 2077.AI

柯良李

MMAR introduces the first comprehensive benchmark designed to assess complex reasoning capabilities across various audio modalities, including speech, sound, music, and their combinations. Experiments on this challenging dataset reveal that most open-source audio models perform near random chance, with the best performing model, Gemini 2.0 Flash, achieving approximately 62% accuracy, highlighting a substantial gap in current audio reasoning abilities.

100

1,223

01 Apr 2025

computer-science artificial-intelligence computation-and-language

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Alex Gu

Wen-Ding Li

BigCodeBench is a new benchmark that evaluates Large Language Models on their ability to generate Python code requiring diverse function calls and complex instructions, revealing that current models like GPT-4o achieve a maximum of 60% accuracy on these challenging tasks, significantly lagging human performance.

296

20 May 2025

agents computer-science human-computer-interaction

ViMo: A Generative Visual GUI World Model for App Agents

Huawei Noah’s Ark Lab

University College London

University of Oxford

Queen Mary University of London

ViMo introduces the first generative visual GUI world model that predicts future application states as high-fidelity images, decoupling graphic and text generation to overcome pixel-level text rendering challenges. This model enhances App agents' decision-making by providing visual foresight, leading to improved task completion and action accuracy.

1,145

13 Dec 2023

computer-science artificial-intelligence computation-and-language

StarCoder: may the source be with you!

Alex Gu

Christopher Akiki

StarCoder and StarCoderBase are large language models for code developed by The BigCode community, demonstrating state-of-the-art performance among open-access models on Python code generation, achieving 33.6% pass@1 on HumanEval, and strong multi-language capabilities, all while integrating responsible AI practices.

673

07 Nov 2025

computer-science graphics

CASteer: Steering Diffusion Models for Controllable Generation

Imperial College London

University College London

Queen Mary University of London CASIA Huawei Noah’s Ark

Tatiana Gaintseva

Diffusion models have transformed image generation, yet controlling their outputs to reliably erase undesired concepts remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering), a training-free framework for concept erasure in diffusion models using steering vectors to influence hidden representations dynamically. CASteer precomputes concept-specific steering vectors by averaging neural activations from images generated for each target concept. During inference, it dynamically applies these vectors to suppress undesired concepts only when they appear, ensuring that unrelated regions remain unaffected. This selective activation enables precise, context-aware erasure without degrading overall image quality. This approach achieves effective removal of harmful or unwanted content across a wide range of visual concepts, all without model retraining. CASteer outperforms state-of-the-art concept erasure techniques while preserving unrelated content and minimizing unintended effects. Pseudocode is provided in the supplementary.

438

18 Dec 2019

computer-science computer-vision-security computer-vision-and-pattern-recognition

Omni-Scale Feature Learning for Person Re-Identification

University of Surrey

Queen Mary University of London Samsung

As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales. We call features of both homogeneous and heterogeneous scales omni-scale features. In this paper, a novel deep ReID CNN is designed, termed Omni-Scale Network (OSNet), for omni-scale feature learning. This is achieved by designing a residual block composed of multiple convolutional streams, each detecting features at a certain scale. Importantly, a novel unified aggregation gate is introduced to dynamically fuse multi-scale features with input-dependent channel-wise weights. To efficiently learn spatial-channel correlations and avoid overfitting, the building block uses pointwise and depthwise convolutions. By stacking such block layer-by-layer, our OSNet is extremely lightweight and can be trained from scratch on existing ReID benchmarks. Despite its small model size, OSNet achieves state-of-the-art performance on six person ReID datasets, outperforming most large-sized models, often by a clear margin. Code and models are available at: \url{this https URL}.

4,629

638

02 Dec 2025

computer-science artificial-intelligence computation-and-language

OmniBench: Towards The Future of Universal Omni-Language Models

University of Manchester

Nanjing University

Queen Mary University of London

Dartmouth College 01.ai Hongkong University of Science and Technology

Yizhi Li

星威曲

Recent advancements in multimodal large language models (MLLMs) have focused on integrating multiple modalities, yet their ability to simultaneously process and reason across different inputs remains underexplored. We introduce OmniBench, a novel benchmark designed to evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define language models capable of such tri-modal processing as omni-language models (OLMs). OmniBench features high-quality human annotations that require integrated understanding across all modalities. Our evaluation reveals that: i) open-source OLMs show significant limitations in instruction-following and reasoning in tri-modal contexts; and ii) most baseline models perform poorly (around 50% accuracy) even with textual alternatives to image/audio inputs. To address these limitations, we develop OmniInstruct, an 96K-sample instruction tuning dataset for training OLMs. We advocate for developing more robust tri-modal integration techniques and training strategies to enhance OLM performance. Codes and data could be found at our repo (this https URL).

5,235

08 Feb 2025

statistical-mechanics general-relativity-and-quantum-cosmology high-energy-physics-theory

Gravity from entropy

Queen Mary University of London

GINESTRA BIANCONI

Ginestra Bianconi's work proposes a theory where gravity arises from a Lorentz-invariant entropic action, defined as the quantum relative entropy between the spacetime metric and a matter-induced metric. This framework yields modified Einstein equations that are at most second-order in derivatives, naturally recovering standard General Relativity in a low coupling limit and demonstrating the emergence of a positive cosmological constant dependent on an auxiliary field.

523

04 Jan 2022

ai-for-health computer-science artificial-intelligence

Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

Queen Mary University of London

Amir

This book provides a structured and comprehensive guide for preparing for deep learning job interviews and graduate exams, featuring hundreds of fully-solved problems. Authored by Shlomo Kashani and edited by Amir Ivry, the resource aims to deepen candidates' conceptual understanding and practical problem-solving skills, enabling them to confidently articulate complex deep learning concepts.

2,041

14 Mar 2025

chain-of-thought computer-science computer-vision-and-pattern-recognition

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

Nanjing University

Nanyang Technological University

Queen Mary University of London

A comprehensive benchmark and evaluation framework for assessing video-language models' spatio-temporal reasoning capabilities, introducing V-STaR dataset and the Reverse Spatio-Temporal Reasoning task that reveals models' ability to ground "what," "when," and "where" aspects of video understanding through coarse-to-fine questioning chains.

444

27 Dec 2024

computer-science artificial-intelligence computation-and-language

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

University of Waterloo

Carnegie Mellon University

New York University Beijing Academy of Artificial Intelligence University of Sheffield

HKUST

Queen Mary University of London

Dartmouth College University of Michigan - Ann Arbor

Yizhi Li

MERT introduces a general-purpose, computationally affordable, self-supervised acoustic music understanding model that employs a novel multi-task framework with both acoustic and music-specific teachers. It achieves state-of-the-art performance across 14 diverse Music Information Retrieval tasks while being significantly more efficient than prior large generative models.

413

1,538

24 Jun 2025

cosmology-and-nongalactic-astrophysics general-relativity-and-quantum-cosmology high-energy-physics-phenomenology

The Atacama Cosmology Telescope: DR6 Constraints on Extended Cosmological Models

University of Toronto

California Institute of Technology

University of Pittsburgh

Carnegie Mellon University

Stanford University

Cornell University

McGill University

University of British Columbia

University of Pennsylvania

Johns Hopkins University

Arizona State University

Princeton University Cardiff University

Queen Mary University of London

Flatiron Institute NIST University of Cape Town University of KwaZulu-Natal WMAP

Utilizing the Atacama Cosmology Telescope's Data Release 6, researchers rigorously tested the standard "Lambda Cold Dark Matter" (ΛCDM) cosmological model and constrained numerous extensions to it, finding continued consistency with ΛCDM and setting the tightest limits to date on many fundamental physics parameters, while observing no statistical preference for models designed to alleviate cosmological tensions like the Hubble or S₈ discrepancies.

20 Nov 2025

agentic-frameworks computer-science artificial-intelligence

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

X-Humanoid

Imperial College London

University of Manchester

Fudan University

Protecting Your LLMs with Information Bottleneck

Nanjing University

Tsinghua University Pennsylvania State University NEC Laboratories America

Queen Mary University of London

Jiang Bian

The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector), a defense mechanism grounded in the information bottleneck principle, and we modify the objective to avoid trivial solutions. The IBProtector selectively compresses and perturbs prompts, facilitated by a lightweight and trainable extractor, preserving only essential information for the target LLMs to respond with the expected answer. Moreover, we further consider a situation where the gradient is not visible to be compatible with any LLM. Our empirical evaluations show that IBProtector outperforms current defense methods in mitigating jailbreak attempts, without overly affecting response quality or inference speed. Its effectiveness and adaptability across various attack methods and target LLMs underscore the potential of IBProtector as a novel, transferable defense that bolsters the security of LLMs without requiring modifications to the underlying models.

22 Mar 2023

computer-science contrastive-learning computer-vision-and-pattern-recognition

MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset

Queen Mary University of London

Deep learning has achieved great success in recent years with the aid of advanced neural network structures and large-scale human-annotated datasets. However, it is often costly and difficult to accurately and efficiently annotate large-scale datasets, especially for some specialized domains where fine-grained labels are required. In this setting, coarse labels are much easier to acquire as they do not require expert knowledge. In this work, we propose a contrastive learning method, called

\textbf{Mask}

\textbf{Con}

trastive learning~(

\textbf{MaskCon}

) to address the under-explored problem setting, where we learn with a coarse-labelled dataset in order to address a finer labelling problem. More specifically, within the contrastive learning framework, for each sample our method generates soft-labels with the aid of coarse labels against other samples and another augmented view of the sample in question. By contrast to self-supervised contrastive learning where only the sample's augmentations are considered hard positives, and in supervised contrastive learning where only samples with the same coarse labels are considered hard positives, we propose soft labels based on sample distances, that are masked by the coarse labels. This allows us to utilize both inter-sample relations and coarse labels. We demonstrate that our method can obtain as special cases many existing state-of-the-art works and that it provides tighter bounds on the generalization error. Experimentally, our method achieves significant improvement over the current state-of-the-art in various datasets, including CIFAR10, CIFAR100, ImageNet-1K, Standford Online Products and Stanford Cars196 datasets. Code and annotations are available at this https URL.

20 Aug 2025

computer-science robotics

Dynamic Risk-Aware MPPI for Mobile Robots in Crowds via Efficient Monte Carlo Approximations

TU Delft

Queen Mary University of London Damen Naval

The Dynamic Risk-Aware MPPI (DRA-MPPI) framework enables mobile robots to navigate crowded environments by efficiently approximating joint collision probabilities from multi-modal human movement predictions. This method maintains high success rates (98-99%) and low collision probabilities while preserving operational efficiency in both simulations and real-robot deployments.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions

Evolutionary game selection creates cooperative environments

Exploring Long-Term Prediction of Type 2 Diabetes Microvascular Complications

RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

ViMo: A Generative Visual GUI World Model for App Agents

StarCoder: may the source be with you!

CASteer: Steering Diffusion Models for Controllable Generation

Omni-Scale Feature Learning for Person Re-Identification

OmniBench: Towards The Future of Universal Omni-Language Models

Gravity from entropy

Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

The Atacama Cosmology Telescope: DR6 Constraints on Extended Cosmological Models

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

Protecting Your LLMs with Information Bottleneck

MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset

Dynamic Risk-Aware MPPI for Mobile Robots in Crowds via Efficient Monte Carlo Approximations

Events

AI for Law

Personalize Your Feed