alphaXiv

History

Papers Benchmarks

AWS

273

10 Jul 2025

computer-science distributed-parallel-and-cluster-computing multiagent-systems

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows

University of California, San Diego AWS

KVFlow introduces a workflow-aware KV cache management framework that leverages an "Agent Step Graph" and overlapped prefetching to enhance serving efficiency for LLM-based multi-agent workflows. It achieved up to 1.83x speedup for single workflows and 2.19x for concurrent workflows compared to reactive caching methods.

134

30 Sep 2025

computer-science machine-learning data-curation

fev-bench: A Realistic Benchmark for Time Series Forecasting

Amazon Boston College AWS

Abdul Fatir Ansari

Benchmark quality is critical for meaningful evaluation and sustained progress in time series forecasting, particularly given the recent rise of pretrained models. Existing benchmarks often have narrow domain coverage or overlook important real-world settings, such as tasks with covariates. Additionally, their aggregation procedures often lack statistical rigor, making it unclear whether observed performance differences reflect true improvements or random variation. Many benchmarks also fail to provide infrastructure for consistent evaluation or are too rigid to integrate into existing pipelines. To address these gaps, we propose fev-bench, a benchmark comprising 100 forecasting tasks across seven domains, including 46 tasks with covariates. Supporting the benchmark, we introduce fev, a lightweight Python library for benchmarking forecasting models that emphasizes reproducibility and seamless integration with existing workflows. Usingfev, fev-bench employs principled aggregation methods with bootstrapped confidence intervals to report model performance along two complementary dimensions: win rates and skill scores. We report results on fev-bench for various pretrained, statistical and baseline models, and identify promising directions for future research.

1,348

21 Jun 2024

computer-science computation-and-language generative-models

Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey

Carnegie Mellon University

National University of Singapore Amazon

University of Virginia AWS

This survey paper from Amazon and AWS provides a comprehensive review of the current landscape for applying Large Language Models to tabular data, covering tasks such as prediction, data generation, and table understanding. It systematically categorizes key techniques, datasets, and metrics, finding that LLMs excel in table understanding and generation when appropriately serialized and prompted, though they often do not yet surpass traditional methods for prediction tasks.

372

10 Apr 2025

computer-science artificial-intelligence distributed-parallel-and-cluster-computing

Marconi: Prefix Caching for the Era of Hybrid LLMs

Princeton University AWS

Rui Pan

Luca Zancato

Marconi, developed by researchers at Princeton University and Amazon Web Services, is the first system to implement efficient prefix caching for Hybrid Large Language Models, which combine Attention and State Space Model layers. It introduces intelligent cache admission and FLOP-aware eviction policies, achieving significantly higher token hit rates and reduced Time To First Token latency compared to previous approaches.

868

06 Dec 2024

computer-science conversational-ai artificial-intelligence

Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications

AWS

Researchers at AWS Bedrock developed a hierarchical multi-agent collaboration framework designed for enterprise applications, addressing complex tasks through coordinated specialist agents. The framework achieved a 90% Goal Success Rate across diverse domains and demonstrated improved communication efficiency through mechanisms like payload referencing and dynamic routing.

08 Oct 2025

adversarial-robustness agentic-frameworks agents

A2AS: Agentic AI Runtime Security and Self-Defense

Google

Conversational Intent-Driven GraphRAG: Enhancing Multi-Turn Dialogue Systems through Adaptive Dual-Retrieval of Flow Patterns and Context Semantics

AWS Huolala XX University

We present CID-GraphRAG (Conversational Intent-Driven Graph Retrieval Augmented Generation), a novel framework that addresses the limitations of existing dialogue systems in maintaining both contextual coherence and goal-oriented progression in multi-turn customer service conversations. Unlike traditional RAG systems that rely solely on semantic similarity (Conversation RAG) or standard knowledge graphs (GraphRAG), CID-GraphRAG constructs dynamic intent transition graphs from goal achieved historical dialogues and implements a dual-retrieval mechanism that adaptively balances intent-based graph traversal with semantic search. This approach enables the system to simultaneously leverage both conversional intent flow patterns and contextual semantics, significantly improving retrieval quality and response quality. In extensive experiments on real-world customer service dialogues, we employ both automatic metrics and LLM-as-judge assessments, demonstrating that CID-GraphRAG significantly outperforms both semantic-based Conversation RAG and intent-based GraphRAG baselines across all evaluation criteria. Quantitatively, CID-GraphRAG demonstrates substantial improvements over Conversation RAG across automatic metrics, with relative gains of 11% in BLEU, 5% in ROUGE-L, 6% in METEOR, and most notably, a 58% improvement in response quality according to LLM-as-judge evaluations. These results demonstrate that the integration of intent transition structures with semantic retrieval creates a synergistic effect that neither approach achieves independently, establishing CID-GraphRAG as an effective framework for addressing the challenges of maintaining contextual coherence and goal-oriented progression in knowledge-intensive multi-turn dialogues.

1,156

31 Mar 2020

computer-science continual-learning machine-learning

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks

UCLA AWS

Aditya Golatkar

UCLA researchers developed a formal definition and efficient "scrubbing" methods for selective forgetting in deep networks, allowing information about specific training data to be removed from model weights while providing a measurable bound on retained information. These methods demonstrate superior information removal and utility preservation compared to baseline approaches by applying controlled noise to model weights based on information theory and stochastic gradient descent stability.

146

23 Jun 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Align and Distill: Unifying and Improving Domain Adaptive Object Detection

California Institute of Technology UMass Amherst

MIT AWS Skagit Fisheries Enhancement Group

Researchers introduce a unified framework, ALDI, for Domain Adaptive Object Detection, which exposes systematic benchmarking flaws and allows for fair comparisons among methods. They also propose ALDI++, a new method that achieves state-of-the-art performance across diverse benchmarks, and release CFC-DAOD, a real-world dataset for environmental monitoring.

06 May 2025

high-energy-physics-lattice high-energy-physics-phenomenology nuclear-theory

Digital quantum simulations of scattering in quantum field theories using W states

University of Washington

California Institute of Technology AWS

Researchers from Caltech, the University of Washington, and AWS Center for Quantum Computing developed a quantum algorithm for efficient wavepacket preparation that achieves superexponential efficiency gains. Their work demonstrated inelastic particle production in quantum simulations of one-dimensional Ising field theory, implemented on a 100-qubit IBM quantum computer, with results validated against classical matrix product state simulations after error mitigation.

241

03 Feb 2025

computer-science machine-learning deep-reinforcement-learning

Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning

University of California Riverside AWS

Jiachen Li

PrefVLM introduces a framework that significantly reduces the human feedback required for preference-based Reinforcement Learning by synergistically integrating Vision-Language Models with selective human feedback. The approach achieves comparable or better robotic task success rates using half the human annotations and demonstrates a fourfold reduction in feedback for knowledge transfer to new tasks.

15 Apr 2023

computer-science data-structures-and-algorithms information-theory

Learning to predict arbitrary quantum processes

California Institute of Technology

UC Berkeley AWS

We present an efficient machine learning (ML) algorithm for predicting any unknown quantum process

\mathcal{E}

over

n

qubits. For a wide range of distributions

\mathcal{D}

on arbitrary

n

-qubit states, we show that this ML algorithm can learn to predict any local property of the output from the unknown process~

\mathcal{E}

, with a small average error over input states drawn from

\mathcal{D}

. The ML algorithm is computationally efficient even when the unknown process is a quantum circuit with exponentially many gates. Our algorithm combines efficient procedures for learning properties of an unknown state and for learning a low-degree approximation to an unknown observable. The analysis hinges on proving new norm inequalities, including a quantum analogue of the classical Bohnenblust-Hille inequality, which we derive by giving an improved algorithm for optimizing local Hamiltonians. Numerical experiments on predicting quantum dynamics with evolution time up to

10^6

and system size up to

50

qubits corroborate our proof. Overall, our results highlight the potential for ML models to predict the output of complex quantum dynamics much faster than the time needed to run the process itself.

10 Feb 2019

computer-science artificial-intelligence machine-learning

Task2Vec: Task Embedding for Meta-Learning

California Institute of Technology

UCLA AWS UCI UMass

TASK2VEC proposes a method to create fixed-dimensional vectorial representations of machine learning tasks using the Fisher Information Matrix of a probe network. These task embeddings enable efficient meta-learning, showing strong correlation with semantic and taxonomic task similarities and demonstrating substantial improvements in selecting optimal pre-trained models for new tasks, particularly in low-data environments.

114

12 Nov 2025

agents chain-of-thought computer-science

e1: Learning Adaptive Control of Reasoning Effort

AWS AWS Agentic AI

Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning. Users may prefer to allocate different amounts of reasoning effort depending on how they value output quality versus latency and cost. To leverage this tradeoff effectively, users need fine-grained control over the amount of thinking used for a particular query, but few approaches enable such control. Existing methods require users to specify the absolute number of desired tokens, but this requires knowing the difficulty of the problem beforehand to appropriately set the token budget for a query. To address these issues, we propose Adaptive Effort Control, a self-adaptive reinforcement learning method that trains models to use a user-specified fraction of tokens relative to the current average chain-of-thought length for each query. This approach eliminates dataset- and phase-specific tuning while producing better cost-accuracy tradeoff curves compared to standard methods. Users can dynamically adjust the cost-accuracy trade-off through a continuous effort parameter specified at inference time. We observe that the model automatically learns to allocate resources proportionally to the task difficulty and, across model scales ranging from 1.5B to 32B parameters, our approach enables a 2-3x reduction in chain-of-thought length while maintaining or improving performance relative to the base model used for RL training.

01 Mar 2022

computer-science machine-learning domain-adaptation

Maximum Mean Discrepancy for Generalization in the Presence of Distribution and Missingness Shift

Bloomberg AWS

Covariate shifts are a common problem in predictive modeling on real-world problems. This paper proposes addressing the covariate shift problem by minimizing Maximum Mean Discrepancy (MMD) statistics between the training and test sets in either feature input space, feature representation space, or both. We designed three techniques that we call MMD Representation, MMD Mask, and MMD Hybrid to deal with the scenarios where only a distribution shift exists, only a missingness shift exists, or both types of shift exist, respectively. We find that integrating an MMD loss component helps models use the best features for generalization and avoid dangerous extrapolation as much as possible for each test sample. Models treated with this MMD approach show better performance, calibration, and extrapolation on the test set.

27 Jun 2025

physics chemical-physics quantum-physics

Quantum-Classical Auxiliary Field Quantum Monte Carlo with Matchgate Shadows on Trapped Ion Quantum Computers

NVIDIA AstraZeneca AWS IonQ, Inc

Researchers advanced Quantum-Classical Auxiliary Field Quantum Monte Carlo, demonstrating its application to a nickel-catalyzed reaction using 24 qubits on an IonQ Forte quantum computer. The work substantially reduced classical post-processing time by orders of magnitude through algorithmic innovations and GPU acceleration, making the hybrid method viable for chemically relevant systems.

19 Mar 2025

computer-science software-engineering

UTFix: Change Aware Unit Test Repairing using LLM

Implicit In-Context Learning: Evidence from Artificial Language Experiments

Ohio State University AWS

Humans acquire language through implicit learning, absorbing complex patterns without explicit awareness. While LLMs demonstrate impressive linguistic capabilities, it remains unclear whether they exhibit human-like pattern recognition during in-context learning at inferencing level. We adapted three classic artificial language learning experiments spanning morphology, morphosyntax, and syntax to systematically evaluate implicit learning at inferencing level in two state-of-the-art OpenAI models: gpt-4o and o3-mini. Our results reveal linguistic domain-specific alignment between models and human behaviors, o3-mini aligns better in morphology while both models align in syntax.

08 Feb 2023

bayesian-deep-learning computer-science machine-learning

Fortuna: A Library for Uncertainty Quantification in Deep Learning

New York University Amazon AWS

We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved uncertainty quantification and accuracy. By providing a coherent framework for advanced uncertainty quantification methods, Fortuna simplifies the process of benchmarking and helps practitioners build robust AI systems.

13 Sep 2025

quantum-gases physics atomic-physics

Non-equilibrium critical scaling and universality in a quantum simulator

Michigan State University

University of Maryland

Rice University

Duke University Johns Hopkins University Applied Physics Laboratory NIST Joint Center for Quantum Information and Computer Science Joint Quantum Institute Facility for Rare Isotope Beams AWS Duke Quantum Center IonQ, Inc

Universality and scaling laws are hallmarks of equilibrium phase transitions and critical phenomena. However, extending these concepts to non-equilibrium systems is an outstanding challenge. Despite recent progress in the study of dynamical phases, the universality classes and scaling laws for non-equilibrium phenomena are far less understood than those in equilibrium. In this work, using a trapped-ion quantum simulator with single-spin resolution, we investigate the non-equilibrium nature of critical fluctuations following a quantum quench to the critical point. We probe the scaling of spin fluctuations after a series of quenches to the critical Hamiltonian of a long-range Ising model. With systems of up to 50 spins, we show that the amplitude and timescale of the post-quench fluctuations scale with system size with distinct universal critical exponents, depending on the quench protocol. While a generic quench can lead to thermal critical behavior, we find that a second quench from one critical state to another (i.e.~a double quench) results in a new universal non-equilibrium behavior, identified by a set of critical exponents distinct from their equilibrium counterparts. Our results demonstrate the ability of quantum simulators to explore universal scaling beyond equilibrium.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring