alphaXiv

History

Papers Benchmarks

Lawrence Livermore National Laboratory

976

20 Nov 2025

agents chain-of-thought computer-science

Early science acceleration experiments with GPT-5

University of Cambridge

Harvard University Vanderbilt University

University of Oxford

OpenAI

Columbia University Collège de France Lawrence Livermore National Laboratory The Jackson Laboratory

OpenAI researchers and collaborators evaluate GPT-5's utility in accelerating scientific research across diverse fields, demonstrating its capacity for contributing to known result rediscovery, literature search, collaborative problem-solving, and the generation of novel scientific findings. The model proved to compress research timelines from months to hours and provided verifiable new insights in mathematics, physics, and biology.

66,908

17 Feb 2025

chain-of-thought computer-science computation-and-language

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Max Planck Institute for Intelligent Systems

University of Maryland, College Park Lawrence Livermore National Laboratory ELLIS Institute Tübingen Tübingen AI Center

Jonas Geiping

John Kirchenbauer

Researchers from ELLIS Institute Tübingen, University of Maryland, and Lawrence Livermore National Laboratory introduce a recurrent depth transformer architecture that scales reasoning abilities by implicitly processing information in a continuous latent space. The 3.5 billion parameter model, Huginn-0125, trained on the Frontier supercomputer, demonstrates significant performance gains on reasoning benchmarks with increased test-time iterations, sometimes matching or exceeding larger models without requiring specialized Chain-of-Thought training data.

939

31 Jan 2024

mathematics optimization-and-control

Exact and Heuristic Approaches for the Stochastic N-k Interdiction in Power Grids

Los Alamos National Laboratory Lawrence Livermore National Laboratory

The article introduces the stochastic N-k interdiction problem for power grid operations and planning that aims to identify a subset of k components (out of N components) that maximizes the expected damage, measured in terms of load shed. Uncertainty is modeled through a fixed set of outage scenarios, where each scenario represents a subset of components removed from the grid. We formulate the stochastic N-k interdiction problem as a bi-level optimization problem and propose two algorithmic solutions. The first approach reformulates the bi-level stochastic optimization problem to a single level, mixed-integer linear program (MILP) by dualizing the inner problem and solving the resulting problem directly using a MILP solver to global optimality. The second is a heuristic cutting-plane approach, which is exact under certain assumptions. We compare these approaches in terms of computation time and solution quality using the IEEE-Reliability Test System and present avenues for future research.

3,378

30 Sep 2024

computer-science computation-and-language

TrustLLM: Trustworthiness in Large Language Models

Michigan State University

University of Illinois at Urbana-Champaign

University of California, Santa Barbara

Harvard University

UCLA

Carnegie Mellon University

University of Notre Dame

University of Southern California

UC Berkeley

Georgia Institute of Technology

Stanford University Illinois Institute of Technology

Texas A&M University

Yale University

Northwestern University

University of Georgia

Microsoft

Columbia University Lehigh University University of Illinois Chicago

Johns Hopkins University

University of Maryland

University of Wisconsin-Madison Massachusetts General Hospital

Mohamed bin Zayed University of Artificial Intelligence Salesforce Research Institut Polytechnique de Paris

Duke University

Virginia Tech William & Mary Florida International University UNC-Chapel Hill CISPA Lawrence Livermore National Laboratory Samsung IBM Research AI Drexel University University of Tennessee, Knoxville

Meng Jiang

Quanxin Mei

The TRUSTLLM framework and benchmark offer a comprehensive system for evaluating the trustworthiness of large language models across six key dimensions. This work reveals that while proprietary models generally exhibit higher trustworthiness, open-source models can also achieve strong performance in specific areas, highlighting challenges like 'over-alignment' and data leakage.

515

1,052

02 Jun 2025

computer-science artificial-intelligence machine-learning

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

University of Illinois at Urbana-Champaign

Carnegie Mellon University

Meta Lawrence Livermore National Laboratory

GRESO efficiently trains large language models for reasoning by selectively filtering out uninformative prompts before costly rollouts, achieving comparable accuracy to state-of-the-art methods while reducing total training time by up to 2.0 times and rollouts by up to 3.35 times. This method addresses the major computational bottleneck in RL-based LLM fine-tuning.

3,387

10 Feb 2025

computer-science computation-and-language machine-learning

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion

University of Virginia Lawrence Livermore National Laboratory

Jacob Christopher

Ferdinando Fioretto

Speculative Diffusion Decoding (SpecDiff) accelerates Large Language Model inference by replacing the autoregressive drafter with a parallel discrete diffusion model, achieving up to 7.2x speedup over standard autoregressive decoding and 1.75x over existing speculative decoding while preserving output quality. This approach reduces computational overhead and enables the use of longer draft lengths.

190

22 Sep 2025

materials-science physics chemical-physics

The Open Catalyst 2025 (OC25) Dataset and Models for Solid-Liquid Interfaces

Nanyang Technological University Texas Tech University Lawrence Livermore National Laboratory FAIR at Meta

The Open Catalyst 2025 (OC25) dataset introduces the largest and most diverse collection of DFT calculations for solid-liquid interfaces, enabling machine learning models to accurately predict energies and forces in complex catalytic systems. This collaborative work from Meta FAIR, Lawrence Livermore National Laboratory, and Texas Tech University expands atomistic simulation capabilities beyond solid-gas interfaces, addressing critical challenges in sustainable energy and chemical production.

1,765

750

27 May 2025

computer-science computation-and-language machine-learning

Constrained Discrete Diffusion

University of Virginia Lawrence Livermore National Laboratory

Michael

Jacob Christopher

Discrete diffusion models are a class of generative models that construct sequences by progressively denoising samples from a categorical noise distribution. Beyond their rapidly growing ability to generate coherent natural language, these models present a new and important opportunity to enforce sequence-level constraints, a capability that current autoregressive models cannot natively provide. This paper capitalizes on this opportunity by introducing Constrained Discrete Diffusion (CDD), a novel integration of differentiable constraint optimization within the diffusion process to ensure adherence to constraints, logic rules, or safety requirements for generated sequences. Unlike conventional text generators that often rely on post-hoc filtering or model retraining for controllable generation, CDD directly imposes constraints into the discrete diffusion sampling process, resulting in a training-free and effective approach. Experiments in toxicity-controlled text generation, property-constrained molecule design, and instruction-constrained text completion demonstrate that CDD achieves zero constraint violations in a diverse array of tasks while preserving fluency, novelty, and coherence while outperforming autoregressive and existing discrete diffusion approaches.

929

03 Dec 2025

computer-science machine-learning distributed-learning

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

KAIST

Université de Montréal

Mila - Quebec AI Institute Lawrence Livermore National Laboratory

Johan S Obando C

Trajectory Balance with Asynchrony (TBA) introduces an asynchronous framework for Large Language Model (LLM) post-training, decoupling data generation from learning using the Trajectory Balance objective. This approach achieves up to 50x speedups in wall-clock time while improving performance on tasks such as mathematical reasoning, preference tuning, and automated red-teaming.

222

10 Jul 2024

adversarial-robustness computer-science cryptography-and-security

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

Lawrence Livermore National Laboratory

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies by Bartoldson et al. from Lawrence Livermore National Laboratory establishes empirical scaling laws and conducts human perception studies to redefine the practical limits of adversarial robustness in image classification. The research achieved a new state-of-the-art of 73.71% AutoAttack accuracy on CIFAR10 and demonstrated that a significant portion of adversarial examples are "invalid" and also confuse humans, suggesting a human-aligned robustness ceiling around 90%.

442

05 Jun 2025

computer-science computation-and-language machine-learning

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

University of Toronto

University of Maryland, College Park

Cornell University Vector Institute

Hugging Face CMU

MIT Lawrence Livermore National Laboratory EleutherAI Lila Sciences poolside Teraflop AI The Allen Institute for Artificial Intelligence

Alon Albalak

This work introduces "The Common Pile v0.1," an 8TB dataset meticulously curated from public domain and strictly openly licensed text, aiming to provide a legally transparent foundation for large language model pretraining. It demonstrates that performant 7-billion parameter language models can be trained on this dataset, exhibiting competitive results compared to models trained on widely used but often unlicensed data sources, particularly excelling in coding and knowledge-based tasks.

03 Sep 2025

materials-science physics data-analysis-statistics-and-probability

Link Statistics of Dislocation Network during Strain Hardening

Stanford University

Rutgers University Lawrence Livermore National Laboratory

Dislocations are line defects in crystals that multiply and self-organize into a complex network during strain hardening. The length of dislocation links, connecting neighboring nodes within this network, contains crucial information about the evolving dislocation microstructure. By analyzing data from Discrete Dislocation Dynamics (DDD) simulations in face-centered cubic (fcc) Cu, we characterize the statistical distribution of link lengths of dislocation networks during strain hardening on individual slip systems. Our analysis reveals that link lengths on active slip systems follow a double-exponential distribution, while those on inactive slip systems conform to a single-exponential distribution. The distinctive long tail observed in the double-exponential distribution is attributed to the stress-induced bowing out of long links on active slip systems, a feature that disappears upon removal of the applied stress. We further demonstrate that both observed link length distributions can be explained by extending a one-dimensional Poisson process to include different growth functions. Specifically, the double-exponential distribution emerges when the growth rate for links exceeding a critical length becomes super-linear, which aligns with the physical phenomenon of long links bowing out under stress. This work advances our understanding of dislocation microstructure evolution during strain hardening and elucidates the underlying physical mechanisms governing its formation.

597

13 Jun 2025

adversarial-attacks adversarial-robustness agentic-frameworks

AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security

University of Maryland, College Park Lawrence Livermore National Laboratory

We introduce AegisLLM, a cooperative multi-agent defense against adversarial attacks and information leakage. In AegisLLM, a structured workflow of autonomous agents - orchestrator, deflector, responder, and evaluator - collaborate to ensure safe and compliant LLM outputs, while self-improving over time through prompt optimization. We show that scaling agentic reasoning system at test-time - both by incorporating additional agent roles and by leveraging automated prompt optimization (such as DSPy)- substantially enhances robustness without compromising model utility. This test-time defense enables real-time adaptability to evolving attacks, without requiring model retraining. Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM. On the WMDP unlearning benchmark, AegisLLM achieves near-perfect unlearning with only 20 training examples and fewer than 300 LM calls. For jailbreaking benchmarks, we achieve 51% improvement compared to the base model on StrongReject, with false refusal rates of only 7.9% on PHTest compared to 18-55% for comparable methods. Our results highlight the advantages of adaptive, agentic reasoning over static defenses, establishing AegisLLM as a strong runtime alternative to traditional approaches based on model modifications. Code is available at this https URL

279

28 May 2024

computer-science artificial-intelligence computation-and-language

Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models

Lawrence Livermore National Laboratory Hong Kong University of Science and Technology (Guangzhou)Drexel University AWS AI Lab

A new uncertainty quantification framework, Shifting Attention to Relevance (SAR), improves the reliability of Large Language Models by re-weighting uncertainty based on the semantic relevance of linguistic components. It yields an average of 7.1% AUROC improvement over prior methods in identifying incorrect free-form generations across various LLMs and datasets.

247

19 Mar 2024

computer-science distributed-parallel-and-cluster-computing

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Argonne National Laboratory Lawrence Livermore National Laboratory Intel Corporation

OpenMC successfully demonstrates performance portability for Monte Carlo particle transport across Intel, NVIDIA, and AMD GPUs using a single OpenMP target offloading codebase. It achieved over 1 billion particles per second on complex reactor simulations and established Intel's Ponte Vecchio Max 1550 as a leading GPU architecture by outperforming NVIDIA A100, GH200, and AMD MI250X GPUs.

13 Nov 2025

computer-science distributed-parallel-and-cluster-computing machine-learning

LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations with Fast All-Reduce Communication

University of Maryland Lawrence Livermore National Laboratory

Researchers from the University of Maryland and Lawrence Livermore National Laboratory investigated multi-node Large Language Model inference, identifying `all-reduce` communication as a key bottleneck in decode-heavy workloads. They developed NVRAR, an NVSHMEM-based `all-reduce` algorithm, which improved end-to-end inference performance by up to 1.86x on 32 GPUs for the Llama 3.1 70B model.

17 Oct 2025

chaotic-dynamics physics computational-physics

Globalizing the Carleman linear embedding method for nonlinear dynamics

Lawrence Livermore National Laboratory

Researchers at Lawrence Livermore National Laboratory devised three globalized strategies for the Carleman linear embedding method, enabling it to accurately model a broad spectrum of nonlinear dynamical systems, including those with multiple fixed points, limit cycles, and chaotic attractors. This advancement expands Carleman's utility for tasks like Koopman mode decomposition and positions it for future hybrid classical-quantum simulations.

17 Jul 2023

physics instrumentation-and-detectors optics

Towards direct spatial and intensity characterization of ultra-high intensity laser pulses using ponderomotive scattering of free electrons

Centro de Láseres Pulsados, CLPU

University of Maryland

University of Alberta

The Ohio State University Lawrence Livermore National Laboratory Universidad de Salamanca

Spatial distributions of electrons ionized and scattered from ultra-low pressure gases are proposed and experimentally demonstrated as a method to directly measure the intensity of an ultra-high intensity laser pulse. Analytic models relating the peak scattered electron energy to the peak laser intensity are derived and compared to paraxial Runge-Kutta simulations highlighting two models suitable for describing electrons scattered from weakly paraxial beams (

f_{\#}>5

) for intensities in the range of

10^{18}-10^{21}

Wcm

^{-2}

. Scattering energies are shown to be dependant on gas species emphasizing the need for specific gases for given intensity ranges. Direct measurements of the laser intensity at full power of two laser systems is demonstrated both showing a good agreement between indirect methods of intensity measurement and the proposed method. One experiment exhibited the role of spatial aberrations in the scattered electron distribution motivating a qualitative study on the effect. We propose the use of convolutional neural networks as a method for extracting quantitative information of the spatial structure of the laser at full power. We believe the presented technique to be a powerful tool that can be immediately implemented in many high-power laser facilities worldwide.

325

10 Oct 2023

computer-science computation-and-language machine-learning

NEFTune: Noisy Embeddings Improve Instruction Finetuning

New York University

University of Maryland Lawrence Livermore National Laboratory

Aniruddha Saha

Jonas Geiping

NEFTune introduces random noise into the embedding vectors during instruction fine-tuning, consistently improving the conversational quality and instruction-following abilities of large language models. The method achieves substantial gains, such as a nearly 35 percentage point increase in AlpacaEval Win Rate for LLaMA-2-7B, by mitigating overfitting to instruction datasets without incurring additional computational cost.

389

858

23 Dec 2024

computer-science artificial-intelligence machine-learning

Transformers Can Do Arithmetic with the Right Embeddings

Max Planck Institute for Intelligent Systems

Carnegie Mellon University

University of Maryland Lawrence Livermore National Laboratory ELLIS Institute Tübingen

Jonas Geiping

John Kirchenbauer

Researchers at the University of Maryland and collaborators developed Abacus Embeddings and recurrent (Looped Transformer) architectures, enabling transformers to perform multi-digit arithmetic with a 6x generalization factor for addition, reaching over 99% accuracy on 100-digit problems after training on 20-digit examples. This approach effectively addresses the challenge of positional understanding in numerical sequences and transfers to other algorithmic tasks like multiplication and sorting.

185

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Early science acceleration experiments with GPT-5

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Exact and Heuristic Approaches for the Stochastic N-k Interdiction in Power Grids

TrustLLM: Trustworthiness in Large Language Models

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion

The Open Catalyst 2025 (OC25) Dataset and Models for Solid-Liquid Interfaces

Constrained Discrete Diffusion

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Link Statistics of Dislocation Network during Strain Hardening

AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security

Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations with Fast All-Reduce Communication

Globalizing the Carleman linear embedding method for nonlinear dynamics

Towards direct spatial and intensity characterization of ultra-high intensity laser pulses using ponderomotive scattering of free electrons

NEFTune: Noisy Embeddings Improve Instruction Finetuning

Transformers Can Do Arithmetic with the Right Embeddings

Events

AI for Law

Personalize Your Feed