alphaXiv

History

Papers Benchmarks

cloud-computing

525

02 Dec 2025

cloud-computing agentic-frameworks agents

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

National University of Singapore

PaperDebugger, developed by NUS researchers, introduces an in-editor multi-agent system as a Chrome extension for Overleaf, embedding LLM-driven academic writing assistance directly into the LaTeX editing environment. The system offers capabilities like structured critiques, text refinement, and literature lookup via patch-based edits, demonstrating successful technical integration and positive early user adoption.

09 Dec 2025

cloud-computing ai-for-health computer-science

Scalable Back-End for an AI-Based Diabetes Prediction Application

The rising global prevalence of diabetes necessitates early detection to prevent severe complications. While AI-powered prediction applications offer a promising solution, they require a responsive and scalable back-end architecture to serve a large user base effectively. This paper details the development and evaluation of a scalable back-end system designed for a mobile diabetes prediction application. The primary objective was to maintain a failure rate below 5% and an average latency of under 1000 ms. The architecture leverages horizontal scaling, database sharding, and asynchronous communication via a message queue. Performance evaluation showed that 83% of the system's features (20 out of 24) met the specified performance targets. Key functionalities such as user profile management, activity tracking, and read-intensive prediction operations successfully achieved the desired performance. The system demonstrated the ability to handle up to 10,000 concurrent users without issues, validating its scalability. The implementation of asynchronous communication using RabbitMQ proved crucial in minimizing the error rate for computationally intensive prediction requests, ensuring system reliability by queuing requests and preventing data loss under heavy load.

05 Dec 2025

cloud-computing agentic-frameworks agents

Trusted AI Agents in the Cloud

AI agents powered by large language models are increasingly deployed as cloud services that autonomously access sensitive data, invoke external tools, and interact with other agents. However, these agents run within a complex multi-party ecosystem, where untrusted components can lead to data leakage, tampering, or unintended behavior. Existing Confidential Virtual Machines (CVMs) provide only per binary protection and offer no guarantees for cross-principal trust, accelerator-level isolation, or supervised agent behavior. We present Omega, a system that enables trusted AI agents by enforcing end-to-end isolation, establishing verifiable trust across all contributing principals, and supervising every external interaction with accountable provenance. Omega builds on Confidential VMs and Confidential GPUs to create a Trusted Agent Platform that hosts many agents within a single CVM using nested isolation. It also provides efficient multi-agent orchestration with cross-principal trust establishment via differential attestation, and a policy specification and enforcement framework that governs data access, tool usage, and inter-agent communication for data protection and regulatory compliance. Implemented on AMD SEV-SNP and NVIDIA H100, Omega fully secures agent state across CVM-GPU, and achieves high performance while enabling high-density, policy-compliant multi-agent deployments at cloud scale.

06 Dec 2025

cloud-computing agentic-frameworks ai-for-cybersecurity

AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity

The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time detection and response across multimodal data streams. This paper introduces AgenticCyber, a generative AI powered multi-agent system that orchestrates specialized agents to monitor cloud logs, surveillance videos, and environmental audio concurrently. The solution achieves 96.2% F1-score in threat detection, reduces response latency to 420 ms, and enables adaptive security posture management using multimodal language models like Google's Gemini coupled with LangChain for agent orchestration. Benchmark datasets, such as AWS CloudTrail logs, UCF-Crime video frames, and UrbanSound8K audio clips, show greater performance over standard intrusion detection systems, reducing mean time to respond (MTTR) by 65% and improving situational awareness. This work introduces a scalable, modular proactive cybersecurity architecture for enterprise networks and IoT ecosystems that overcomes siloed security technologies with cross-modal reasoning and automated remediation.

02 Dec 2025

cloud-computing agentic-frameworks agents

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

National University of Singapore

Large language models are increasingly embedded into academic writing workflows, yet existing assistants remain external to the editor, preventing deep interaction with document state, structure, and revision history. This separation makes it impossible to support agentic, context-aware operations directly within LaTeX editors such as Overleaf. We present PaperDebugger, an in-editor, multi-agent, and plugin-based academic writing assistant that brings LLM-driven reasoning directly into the writing environment. Enabling such in-editor interaction is technically non-trivial: it requires reliable bidirectional synchronization with the editor, fine-grained version control and patching, secure state management, multi-agent scheduling, and extensible communication with external tools. PaperDebugger addresses these challenges through a Chrome-approved extension, a Kubernetes-native orchestration layer, and a Model Context Protocol (MCP) toolchain that integrates literature search, reference lookup, document scoring, and revision pipelines. Our demo showcases a fully integrated workflow, including localized edits, structured reviews, parallel agent execution, and diff-based updates, encapsulated within a minimal-intrusion user interface (UI). Early aggregated analytics demonstrate active user engagement and validate the practicality of an editor-native, agentic writing assistant. More details about this demo and video could be found at this https URL.

981

30 Nov 2025

cloud-computing computer-science artificial-intelligence

SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs

SpeContext, developed by researchers at Shanghai Jiao Tong University, introduces an algorithm and system co-design that uses a lightweight distilled model for speculative KV cache sparsity, enabling efficient long-context reasoning in LLMs. The approach achieves up to 24.89x throughput improvement in cloud environments and 10.06x speedup on edge GPUs while maintaining accuracy.

22 Nov 2025

cloud-computing autonomous-vehicles computer-science

Towards a future space-based, highly scalable AI infrastructure system design

If AI is a foundational general-purpose technology, we should anticipate that demand for AI compute -- and energy -- will continue to grow. The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power. This work explores a scalable compute system for machine learning in space, using fleets of satellites equipped with solar arrays, inter-satellite links using free-space optics, and Google tensor processing unit (TPU) accelerator chips. To facilitate high-bandwidth, low-latency inter-satellite communication, the satellites would be flown in close proximity. We illustrate the basic approach to formation flight via a 81-satellite cluster of 1 km radius, and describe an approach for using high-precision ML-based models to control large-scale constellations. Trillium TPUs are radiation tested. They survive a total ionizing dose equivalent to a 5 year mission life without permanent failures, and are characterized for bit-flip errors. Launch costs are a critical part of overall system cost; a learning curve analysis suggests launch to low-Earth orbit (LEO) may reach

\lesssim

\$200/kg by the mid-2030s.

508

30 Oct 2025

cloud-computing agent-based-systems agentic-frameworks

The Denario project: Deep knowledge AI agents for scientific discovery

Google DeepMind

University of Cambridge

Harvard University

Tel Aviv University

University of Oxford LMU Munich

the University of Tokyo

The University of Texas at Austin

Cornell University Harvard Medical School

Johns Hopkins University

University of Arizona

MIT

Princeton University ICREA Universitat de Barcelona

Flatiron Institute

University of Virginia The University of Chicago SISSA — International School for Advanced Studies Universitat Autònoma de Barcelona Donostia International Physics Center University of the Basque Country Computer Vision Center ICSC - Centro Nazionale di Ricerca in High Performance Computing, Big Data e Quantum Computing Kavli Institute for Cosmology Steward Observatory Institut de Ciències del Cosmos Infosys Ltd.Big Data Institute INFN National Institute for Nuclear Physics Boston Childreneach Hospital Ragon Institute of Mass General MCML - Munich Center for Machine Learning IFPU Institute for fundamental physics of the Universe INAF ` Osservatorio Astronomico di Trieste

We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at this https URL. A Denario demo can also be run directly on the web at this https URL, and the full app will be deployed on the cloud.

28 Oct 2025

cloud-computing computer-science distributed-parallel-and-cluster-computing

ARIMA_PLUS: Large-scale, Accurate, Automatic and Interpretable In-Database Time Series Forecasting and Anomaly Detection in Google BigQuery

Time series forecasting and anomaly detection are common tasks for practitioners in industries such as retail, manufacturing, advertising and energy. Two unique challenges stand out: (1) efficiently and accurately forecasting time series or detecting anomalies in large volumes automatically; and (2) ensuring interpretability of results to effectively incorporate business insights. We present ARIMA_PLUS, a novel framework to overcome these two challenges by a unique combination of (a) accurate and interpretable time series models and (b) scalable and fully managed system infrastructure. The model has a sequential and modular structure to handle different components of the time series, including holiday effects, seasonality, trend, and anomalies, which enables high interpretability of the results. Novel enhancements are made to each module, and a unified framework is established to address both forecasting and anomaly detection tasks simultaneously. In terms of accuracy, its comprehensive benchmark on the 42 public datasets in the Monash forecasting repository shows superior performance over not only well-established statistical alternatives (such as ETS, ARIMA, TBATS, Prophet) but also newer neural network models (such as DeepAR, N-BEATS, PatchTST, TimeMixer). In terms of infrastructure, it is directly built into the query engine of BigQuery in Google Cloud. It uses a simple SQL interface and automates tedious technicalities such as data cleaning and model selection. It automatically scales with managed cloud computational and storage resources, making it possible to forecast 100 million time series using only 1.5 hours with a throughput of more than 18000 time series per second. In terms of interpretability, we present several case studies to demonstrate time series insights it generates and customizability it offers.

24 Oct 2025

cloud-computing agentic-frameworks agents

LightAgent: Mobile Agentic Foundation Models

With the advancement of multimodal large language models (MLLMs), building GUI agent systems has become an increasingly promising direction-especially for mobile platforms, given their rich app ecosystems and intuitive touch interactions. Yet mobile GUI agents face a critical dilemma: truly on-device models (4B or smaller) lack sufficient performance, while capable models (starting from 7B) are either too large for mobile deployment or prohibitively costly (e.g., cloud-only closed-source MLLMs). To resolve this, we propose LightAgent, a mobile agentic foundation model solution that leverages device-cloud collaboration to tap the cost-efficiency of on-device models and the high capability of cloud models, while avoiding their drawbacks. Specifically, LightAgent enhances Qwen2.5-VL-3B via two-stage SFT->GRPO training on synthetic GUI data for strong decision-making, integrates an efficient long-reasoning mechanism to utilize historical interactions under tight resources, and defaults to on-device execution-only escalating challenging subtasks to the cloud via real-time complexity assessment. Experiments on the online AndroidLab benchmark and diverse apps show LightAgent matches or nears larger models, with a significant reduction in cloud costs.

22 Oct 2025

cloud-computing causal-inference computer-science

A Goal-Driven Survey on Root Cause Analysis

The Chinese University of Hong Kong, Shenzhen

The Chinese University of Hong Kong, Shenzhen, introduces a goal-driven framework for surveying Root Cause Analysis (RCA) research, formalizing RCA as mapping observational data to a complete incident propagation graph. An analysis of 135 papers identifies key research gaps, especially the field's focus on "point-finding" instead of "graph-building" RCA, and calls for next-generation benchmarks that include complete ground-truth causal graphs.

20 Oct 2025

cloud-computing agents computer-science

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

Bauplan Labs developed an iterative AI-driven methodology for distributed systems design, employing Large Language Models to generate novel scheduling policies for a Function-as-a-Service runtime. This approach, validated through a simulator, achieved throughput improvements of up to 371.1% over a baseline FIFO scheduler.

169

05 Dec 2025

cloud-computing computer-science machine-learning

LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference

University of Chicago TensorMesh Inc.

KV cache has traditionally been stored in GPU memory to accelerate the decoding phase of large language model (LLM) inference. However, it is increasingly necessary to move KV caches outside GPU devices, to enable cache reuse across different queries and inference engines. Our real-world usage statistics confirm this trend: over time, the total KV cache stored by users has grown rapidly, far exceeding the capacity of GPU memory. Despite this need, there lacks an efficient solution for offloading and transferring KV caches. We present LMCACHE, the first and so far the most efficient open-source KV caching solution, which extracts and stores KV caches generated by modern LLM engines (vLLM and SGLang) out of the GPU memory and shares them across engines and queries. LMCACHE supports both cache offloading (prefix reuse across queries) and prefill-decode (PD) disaggregation (cross-engine/GPU cache transfer). LMCACHE's high performance and wide adoption stem from the following contributions: (1) highly optimized KV cache data movement powered by batched data movement operations, compute and I/O pipelining; (2) a modular KV cache connector component, decoupling LMCACHE from the rapid evolution of inference engines; (3) a first-class control API for flexible cache orchestration across GPU, CPU, storage, and network layers. Our evaluation shows that combining LMCACHE with vLLM achieves up to 15x improvement in throughput across workloads such as multi-round question answering and document analysis. Large-scale adoption of LMCACHE in enterprise settings provides us valuable insights, for example, fetching KV cache from remote storage has unsurprisingly benefits to prefill delay, and that context truncation, which is a widely applied technique in industry, can greatly reduce prefix cache hit ratio by half. The source code of LMCACHE is at: this https URL.

5,536

679

10 Oct 2025

cloud-computing agents computer-science

Barbarians at the Gate: How AI is Upending Systems Research

UC Berkeley

Researchers at UC Berkeley demonstrated that an AI-Driven Research for Systems (ADRS) framework can autonomously discover and refine algorithms for complex systems problems, frequently surpassing human-designed state-of-the-art solutions within hours and at low computational cost. The approach utilizes large language models (LLMs) to iteratively generate and evaluate solutions within high-fidelity simulators across diverse systems domains.

25 Sep 2025

cloud-computing agents computer-science

AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need

AutoClimDS introduces an agentic AI system powered by a knowledge graph to automate complex climate data science workflows. It demonstrates the ability to accurately replicate specific climate analysis, such as sea level trend reports, from natural language queries, thereby reducing the technical barrier for researchers and enhancing reproducibility.

18 Sep 2025

cloud-computing agent-based-systems autonomous-vehicles

AI-Driven Multi-Agent Vehicular Planning for Battery Efficiency and QoS in 6G Smart Cities

While simulators exist for vehicular IoT nodes communicating with the Cloud through Edge nodes in a fully-simulated osmotic architecture, they often lack support for dynamic agent planning and optimisation to minimise vehicular battery consumption while ensuring fair communication times. Addressing these challenges requires extending current simulator architectures with AI algorithms for both traffic prediction and dynamic agent planning. This paper presents an extension of SimulatorOrchestrator (SO) to meet these requirements. Preliminary results over a realistic urban dataset show that utilising vehicular planning algorithms can lead to improved battery and QoS performance compared with traditional shortest path algorithms. The additional inclusion of desirability areas enabled more ambulances to be routed to their target destinations while utilising less energy to do so, compared to traditional and weighted algorithms without desirability considerations.

05 Aug 2025

cloud-computing computer-science computation-and-language

NameTag 3: A Tool and a Service for Multilingual/Multitagset NER

Charles University

We introduce NameTag 3, an open-source tool and cloud-based web service for multilingual, multidataset, and multitagset named entity recognition (NER), supporting both flat and nested entities. NameTag 3 achieves state-of-the-art results on 21 test datasets in 15 languages and remains competitive on the rest, even against larger models. It is available as a command-line tool and as a cloud-based service, enabling use without local installation. NameTag 3 web service currently provides flat NER for 17 languages, trained on 21 corpora and three NE tagsets, all powered by a single 355M-parameter fine-tuned model; and nested NER for Czech, powered by a 126M fine-tuned model. The source code is licensed under open-source MPL 2.0, while the models are distributed under non-commercial CC BY-NC-SA 4.0. Documentation is available at this https URL, source code at this https URL, and trained models via this https URL. The REST service and the web application can be found at this https URL. A demonstration video is available at this https URL.

789

21 May 2025

cloud-computing computer-science artificial-intelligence

Robo-DM: Data Management For Large Robot Datasets

Google DeepMind

UC Berkeley

Robo-DM provides an efficient, open-source data management toolkit designed for large robot datasets, addressing challenges of cost, complexity, and performance. The system achieves substantial data size reductions and faster loading times while preserving the performance of trained robot policies.

12 May 2025

cloud-computing computer-science artificial-intelligence

Bang for the Buck: Vector Search on Cloud CPUs

This paper evaluates the performance and cost-effectiveness of different cloud CPU architectures for vector similarity search, demonstrating that AWS Graviton3 often provides the best 'queries per dollar' despite being an older architecture. The study provides comparative benchmarks for HNSW and IVF algorithms with various quantization techniques across ARM and x86 CPUs.

2,756

08 May 2025

cloud-computing computer-science computation-and-language

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Electronics and Telecommunications Research Institute Korea Electronics Technology Institute

This survey systematically evaluates 25 open-source and commercial Large Language Model inference engines, categorizing their optimization techniques and assessing their practical performance across dimensions like ease-of-use, scalability, and hardware support. It aims to provide practical guidance for researchers and developers in selecting and designing efficient LLM deployment solutions based on current ecosystem trends.

137

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Scalable Back-End for an AI-Based Diabetes Prediction Application

Trusted AI Agents in the Cloud

AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs

Towards a future space-based, highly scalable AI infrastructure system design

The Denario project: Deep knowledge AI agents for scientific discovery

ARIMA_PLUS: Large-scale, Accurate, Automatic and Interpretable In-Database Time Series Forecasting and Anomaly Detection in Google BigQuery

LightAgent: Mobile Agentic Foundation Models

A Goal-Driven Survey on Root Cause Analysis

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference

Barbarians at the Gate: How AI is Upending Systems Research

AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need

AI-Driven Multi-Agent Vehicular Planning for Battery Efficiency and QoS in 6G Smart Cities

NameTag 3: A Tool and a Service for Multilingual/Multitagset NER

Robo-DM: Data Management For Large Robot Datasets

Bang for the Buck: Vector Search on Cloud CPUs

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Events

AI for Law

Personalize Your Feed