alphaXiv

History

Papers Benchmarks

The University of Chicago

1,956

01 Feb 2024

instrumentation-and-methods-for-astrophysics general-relativity-and-quantum-cosmology physics

Gravity Spy: Lessons Learned and a Path Forward

California Institute of Technology

Northwestern University Louisiana State University

University of Wisconsin-Madison

MIT University of Glasgow Syracuse University The University of Chicago California State University, Fullerton Kavli Institute for Cosmological Physics Christopher Newport University LIGO Laboratory Enrico Fermi Institute LIGO Hanford Observatory Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA)Sorbonne Paris Nord University Zooniverse, The Adler Planetarium The Nicholas and Lee Begovich Center for Gravitational-Wave Physics and Astronomy ConSol Software GmbH

The Gravity Spy project aims to uncover the origins of glitches, transient bursts of noise that hamper analysis of gravitational-wave data. By using both the work of citizen-science volunteers and machine-learning algorithms, the Gravity Spy project enables reliable classification of glitches. Citizen science and machine learning are intrinsically coupled within the Gravity Spy framework, with machine-learning classifications providing a rapid first-pass classification of the dataset and enabling tiered volunteer training, and volunteer-based classifications verifying the machine classifications, bolstering the machine-learning training set and identifying new morphological classes of glitches. These classifications are now routinely used in studies characterizing the performance of the LIGO gravitational-wave detectors. Providing the volunteers with a training framework that teaches them to classify a wide range of glitches, as well as additional tools to aid their investigations of interesting glitches, empowers them to make discoveries of new classes of glitches. This demonstrates that, when giving suitable support, volunteers can go beyond simple classification tasks to identify new features in data at a level comparable to domain experts. The Gravity Spy project is now providing volunteers with more complicated data that includes auxiliary monitors of the detector to identify the root cause of glitches.

2,083

01 Jul 2024

ai-for-genomics ai-for-health cloud-computing

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

University of Illinois at Urbana-Champaign

Argonne National Laboratory The University of Chicago National Center for Supercomputing Applications

APACE is a computational framework that optimizes AlphaFold2 for supercomputing environments, significantly accelerating protein structure prediction. The system delivers speedups of up to two orders of magnitude and efficiently generates diverse conformational ensembles, transforming prediction times from weeks to minutes.

524

18 Nov 2025

computer-science computer-vision-and-pattern-recognition fine-tuning

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

ByteDance

Peking University

Princeton University The University of Chicago CASIA

Researchers from Peking University, Princeton University, and ByteDance introduce MMaDA-Parallel, a multimodal large diffusion language model that generates text and images in parallel, addressing error propagation in sequential thinking-aware models. It achieved the highest output alignment score on a new ParaBench benchmark by enabling continuous, bidirectional interaction between modalities and optimizing with trajectory-level reinforcement learning.

508

30 Oct 2025

agent-based-systems agentic-frameworks cloud-computing

The Denario project: Deep knowledge AI agents for scientific discovery

Google DeepMind

University of Cambridge

Harvard University

Tel Aviv University

University of Oxford LMU Munich

the University of Tokyo

The University of Texas at Austin

Cornell University Harvard Medical School

Johns Hopkins University

University of Arizona

MIT

Princeton University ICREA Universitat de Barcelona

Flatiron Institute

University of Virginia The University of Chicago SISSA — International School for Advanced Studies Universitat Autònoma de Barcelona Donostia International Physics Center University of the Basque Country Computer Vision Center ICSC - Centro Nazionale di Ricerca in High Performance Computing, Big Data e Quantum Computing Kavli Institute for Cosmology Steward Observatory Institut de Ciències del Cosmos Infosys Ltd.Big Data Institute INFN National Institute for Nuclear Physics Boston Childreneach Hospital Ragon Institute of Mass General MCML - Munich Center for Machine Learning IFPU Institute for fundamental physics of the Universe INAF ` Osservatorio Astronomico di Trieste

We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at this https URL. A Denario demo can also be run directly on the web at this https URL, and the full app will be deployed on the cloud.

354

09 Oct 2025

computer-science artificial-intelligence computation-and-language

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

The University of Hong Kong The University of Chicago

A self-supervised framework, LightReasoner, enhances large language model reasoning by deriving contrastive supervision from the behavioral differences between an expert and a weaker amateur model. The method improves mathematical reasoning accuracy by up to 28.1% on GSM8K, requiring 90% less training time and 99% fewer tuned tokens compared to existing fine-tuning techniques.

668

11 Jun 2025

agentic-frameworks agents computer-science

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

University of Cambridge

CUHK

Northwestern University UCSB

The University of Hong Kong

KAUST The University of Chicago ANU Eigent.AI CAMEL-AI.org

yuzhou nie

This research introduces WORKFORCE, a modular multi-agent inference architecture that decouples planning from execution, and OPTIMIZED WORKFORCE LEARNING (OWL), a training paradigm focused on a domain-agnostic planner. The system achieved 69.70% accuracy on the GAIA benchmark, setting a new open-source state-of-the-art and outperforming commercial baselines like OpenAI's Deep Research.

16,773

697

19 Jul 2024

computer-science machine-learning networking-and-internet-architecture

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

Microsoft The University of Chicago

Qizheng Zhang

CacheGen optimizes Large Language Model serving by compressing and streaming Key-Value (KV) caches, addressing the network bottleneck in fetching long contexts. This system reduces Time-to-First-Token (TTFT) by 3.1-4.7x and the KV cache size by 3.5-4.3x with marginal impact on model quality.

133

277

08 Dec 2025

agents computer-science artificial-intelligence

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Zhejiang University

The Hong Kong Polytechnic University Amazon The University of Chicago InfiX.ai

LI Pengxiang

Zeyu Liu

The emergence of Multimodal Large Language Models (MLLMs) has propelled the development of autonomous agents that operate on Graphical User Interfaces (GUIs) using pure visual input. A fundamental challenge is robustly grounding natural language instructions. This requires a precise spatial alignment, which accurately locates the coordinates of each element, and, more critically, a correct semantic alignment, which matches the instructions to the functionally appropriate UI element. Although Reinforcement Learning with Verifiable Rewards (RLVR) has proven to be effective at improving spatial alignment for these MLLMs, we find that inefficient exploration bottlenecks semantic alignment, which prevent models from learning difficult semantic associations. To address this exploration problem, we present Adaptive Exploration Policy Optimization (AEPO), a new policy optimization framework. AEPO employs a multi-answer generation strategy to enforce broader exploration, which is then guided by a theoretically grounded Adaptive Exploration Reward (AER) function derived from first principles of efficiency eta=U/C. Our AEPO-trained models, InfiGUI-G1-3B and InfiGUI-G1-7B, establish new state-of-the-art results across multiple challenging GUI grounding benchmarks, achieving significant relative improvements of up to 9.0% against the naive RLVR baseline on benchmarks designed to test generalization and semantic understanding. Resources are available at this https URL.

111

390

02 Aug 2024

computer-science artificial-intelligence neural-and-evolutionary-computing

Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction

Google The University of Chicago

This research demonstrates the spontaneous emergence of well-formed, self-replicating programs from random, non-replicating code in various computational environments, including minimalist languages and real-world instruction sets, without explicit fitness functions. The study finds this emergence is primarily driven by self-modification and interaction, not solely random mutations, and leads to subsequent complex dynamics like competition and co-existence.

123

04 Jun 2025

computer-science computation-and-language generative-models

SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL

The University of Chicago Amazon Web Services

SQLENS, an end-to-end framework from AWS, The University of Chicago, and MIT, addresses the issue of semantically incorrect SQL queries generated by Large Language Models (LLMs) by integrating diverse database and LLM-based error signals for fine-grained detection and iterative correction. The framework boosts Text-to-SQL system execution accuracy on benchmarks like BIRD by up to 20.50% and achieves an F1 score of 78.88 for error detection.

607

09 Apr 2025

adversarial-robustness computer-science artificial-intelligence

Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play

Google DeepMind The University of Chicago

Current reinforcement learning (RL) frameworks for large language models (LLM) post-training typically assume a fixed prompt distribution, which is sub-optimal and bottlenecks scalability. Prior works have explored prompt evolving, but are often limited to the supervised fine-tuning stage, and prompts are sampled and evolved uniformly without signals. This empirical work presents a paradigm shift: Evolving Alignment via Asymmetric Self-Play (eva), that casts post-training as an infinite game with regret-based signals for 2 players: (i) a creator, who strategically samples and creates new informative prompts and (ii) a solver, who learns to produce preferred responses. eva is the first method that allows language models to adaptively create training prompts in both offline and online RL post-training. The design is simple, easy-to-use yet remarkably effective: eva sets a new SOTA on challenging benchmarks, without any extra human prompts, e.g. it boosts the win-rate of gemma-2-9b-it on Arena-Hard by 51.6% -> 60.1% for DPO and 52.6% -> 62.4% for RLOO, surpassing claude-3-opus and catching up to gemini-1.5-pro, both of which are orders of magnitude larger. Extensive experiments show eva can create effective RL curricula and is robust across ablations. We believe adaptively evolving prompts are key to designing the next-generation RL post-training scheme.

26 Oct 2025

causal-inference computer-science machine-learning

Differentiable Structure Learning and Causal Discovery for General Binary Data

The University of Chicago

This research proposes a differentiable framework for learning causal structures from general binary data, utilizing the multivariate Bernoulli distribution to capture arbitrary dependencies. It demonstrates that while exact DAGs are non-identifiable, causal structures are identifiable up to Markov equivalence under a sparsity assumption. The developed method, BiNOTEARS, shows improved accuracy over baselines on synthetic datasets with complex interactions and a real-world biological network.

453

01 Nov 2024

computer-science artificial-intelligence computation-and-language

Can Large Language Model Agents Simulate Human Trust Behavior?

California Institute of Technology

University of Oxford Illinois Institute of Technology Pennsylvania State University

KAUST The University of Chicago

This research investigates whether large language model agents can simulate human trust behavior, introducing the concept of 'behavioral alignment' to assess their capacity to mirror human conduct and reasoning. The study finds that GPT-4 agents, in particular, exhibit high behavioral alignment with humans across various trust-related scenarios, including reciprocity, risk perception, and behavioral dynamics over time.

133

26 Jul 2024

physics quantum-physics

Fast and Parallelizable Logical Computation with Homological Product Codes

Harvard University The University of Chicago QuEra Computing Inc.

Quantum error correction is necessary to perform large-scale quantum computation, but requires extremely large overheads in both space and time. High-rate quantum low-density-parity-check (qLDPC) codes promise a route to reduce qubit numbers, but performing computation while maintaining low space cost has required serialization of operations and extra time costs. In this work, we design fast and parallelizable logical gates for qLDPC codes, and demonstrate their utility for key algorithmic subroutines such as the quantum adder. Our gate gadgets utilize transversal logical CNOTs between a data qLDPC code and a suitably constructed ancilla code to perform parallel Pauli product measurements (PPMs) on the data logical qubits. For hypergraph product codes, we show that the ancilla can be constructed by simply modifying the base classical codes of the data code, achieving parallel PPMs on a subgrid of the logical qubits with a lower space-time cost than existing schemes for an important class of circuits. Generalizations to 3D and 4D homological product codes further feature fast PPMs in constant depth. While prior work on qLDPC codes has focused on individual logical gates, we initiate the study of fault-tolerant compilation with our expanded set of native qLDPC code operations, constructing algorithmic primitives for preparing

k

-qubit GHZ states and distilling/teleporting

k

magic states with

O(1)

space overhead in

O(1)

and

O(\sqrt{k} \log k)

logical cycles, respectively. We further generalize this to key algorithmic subroutines, demonstrating the efficient implementation of quantum adders using parallel operations. Our constructions are naturally compatible with reconfigurable architectures such as neutral atom arrays, paving the way to large-scale quantum computation with low space and time overheads.

02 Oct 2025

causal-inference computer-science machine-learning

Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code

University of Illinois at Urbana-Champaign

Microsoft The University of Chicago TTIC

A new framework, "executable counterfactuals," rigorously evaluates and enhances large language models' causal reasoning by requiring them to perform abduction, intervention, and prediction. Models trained with reinforcement learning from verifiable rewards consistently generalize this reasoning to out-of-distribution code and math problems, unlike those trained with supervised finetuning, despite all models showing a significant accuracy drop when required to infer latent variables.

130

27 Nov 2024

bayesian-optimization causal-inference computer-science

Markov Equivalence and Consistency in Differentiable Structure Learning

Carnegie Mellon University The University of Chicago

Existing approaches to differentiable structure learning of directed acyclic graphs (DAGs) rely on strong identifiability assumptions in order to guarantee that global minimizers of the acyclicity-constrained optimization problem identifies the true DAG. Moreover, it has been observed empirically that the optimizer may exploit undesirable artifacts in the loss function. We explain and remedy these issues by studying the behavior of differentiable acyclicity-constrained programs under general likelihoods with multiple global minimizers. By carefully regularizing the likelihood, it is possible to identify the sparsest model in the Markov equivalence class, even in the absence of an identifiable parametrization. We first study the Gaussian case in detail, showing how proper regularization of the likelihood defines a score that identifies the sparsest model. Assuming faithfulness, it also recovers the Markov equivalence class. These results are then generalized to general models and likelihoods, where the same claims hold. These theoretical results are validated empirically, showing how this can be done using standard gradient-based optimizers, thus paving the way for differentiable structure learning under general models and losses.

05 Jun 2025

computer-science artificial-intelligence machine-learning

Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress

The University of Chicago

Regulatory efforts to govern large language model (LLM) development have predominantly focused on restricting access to high-performance computational resources. This study evaluates the efficacy of such measures by examining whether LLM capabilities can advance through algorithmic innovation in compute-constrained environments. We propose a novel framework distinguishing compute-dependent innovations--which yield disproportionate benefits at high compute--from compute-independent innovations, which improve efficiency across compute scales. The impact is quantified using Compute-Equivalent Gain (CEG). Experimental validation with nanoGPT models confirms that compute-independent advancements yield significant performance gains (e.g., with combined CEG up to

3.5\times

) across the tested scales. In contrast, compute-dependent advancements were detrimental to performance at smaller experimental scales, but showed improved CEG (on par with the baseline) as model size increased, a trend consistent with their definition of yielding primary benefits at higher compute. Crucially, these findings indicate that restrictions on computational hardware, while potentially slowing LLM progress, are insufficient to prevent all capability gains driven by algorithmic advancements. We argue that effective AI oversight must therefore incorporate mechanisms for understanding, anticipating, and potentially guiding algorithmic research, moving beyond a singular focus on hardware. The proposed framework also serves as an analytical tool for forecasting AI progress.

124

05 Jun 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

PixCell: A generative foundation model for digital histopathology images

Argonne National Laboratory

Stony Brook University The University of Chicago

PixCell, a diffusion-based generative foundation model, creates high-fidelity synthetic histopathology images conditioned by self-supervised UNI-2h embeddings. Trained on a 30.8 million patch dataset, it achieves state-of-the-art image quality and enables controllable generation and virtual staining, demonstrating synthetic data can effectively substitute real data for self-supervised learning.

391

19 May 2025

computer-science artificial-intelligence computation-and-language

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

Carnegie Mellon University The University of Chicago

Improving time-to-first-token (TTFT) is an essentially important objective in modern large language model (LLM) inference engines. Optimizing TTFT directly results in higher maximal QPS and meets the requirements of many critical applications. However, boosting TTFT is notoriously challenging since it is compute-bounded and the performance bottleneck shifts from the self-attention that many prior works focus on to the MLP part. In this work, we present SpecPrefill, a training free framework that accelerates the inference TTFT for both long and medium context queries based on the following insight: LLMs are generalized enough to preserve the quality given only a carefully chosen subset of prompt tokens. At its core, SpecPrefill leverages a lightweight model to speculate locally important tokens based on the context. These tokens, along with the necessary positional information, are then sent to the main model for processing. We evaluate SpecPrefill with a diverse set of tasks, followed by a comprehensive benchmarking of performance improvement both in a real end-to-end setting and ablation studies. SpecPrefill manages to serve Llama-3.1-405B-Instruct-FP8 with up to 7

\times

maximal end-to-end QPS on real downstream tasks and 7.66

\times

TTFT improvement.

273

06 Aug 2024

computer-science machine-learning networking-and-internet-architecture

NetLLM: Adapting Large Language Models for Networking

Tsinghua University The Chinese University of Hong Kong, Shenzhen The University of Chicago

NetLLM proposes a pioneering framework for adapting large language models (LLMs) to solve diverse networking problems by overcoming input modality gaps, output inefficiencies, and high adaptation costs. This approach achieved a 10.1-36.6% reduction in viewport prediction error, a 14.5-36.6% improvement in adaptive bitrate streaming Quality of Experience, and a 6.8-41.3% reduction in cluster job scheduling time, while demonstrating enhanced generalization compared to specialized learning-based baselines.

108

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Gravity Spy: Lessons Learned and a Path Forward

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

The Denario project: Deep knowledge AI agents for scientific discovery

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction

SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL

Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play

Differentiable Structure Learning and Causal Discovery for General Binary Data

Can Large Language Model Agents Simulate Human Trust Behavior?

Fast and Parallelizable Logical Computation with Homological Product Codes

Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code

Markov Equivalence and Consistency in Differentiable Structure Learning

Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress

PixCell: A generative foundation model for digital histopathology images

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

NetLLM: Adapting Large Language Models for Networking

Events

AI for Law

Personalize Your Feed