alphaXiv

History

Papers Benchmarks

Oregon State University

7,363

01 Aug 2025

agentic-frameworks agents ai-for-health

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Princeton AI Lab

University of Illinois at Urbana-Champaign

University of California, Santa Barbara

Carnegie Mellon University

Fudan University

Shanghai Jiao Tong University

Tsinghua University

University of Michigan

The Chinese University of Hong Kong The Hong Kong University of Science and Technology (Guangzhou)

University of California, San Diego Pennsylvania State University

The University of Hong Kong

Princeton University

University of Sydney Oregon State University

An extensive international collaboration offers the first systematic review of self-evolving agents, establishing a unified theoretical framework categorized by 'what to evolve,' 'when to evolve,' and 'how to evolve'. The work consolidates diverse research, highlights key challenges, and maps applications, aiming to guide the development of AI systems capable of continuous autonomous improvement.

511

2,040

10 Aug 2024

biomolecules quantitative-biology

Undesignable RNA Structure Identification via Rival Structure Generation and Structure Decomposition

Oregon State University University of Rochester Medical Center

RNA design is the search for a sequence or set of sequences that will fold into predefined structures, also known as the inverse problem of RNA folding. While numerous RNA design methods have been invented to find sequences capable of folding into a target structure, little attention has been given to the identification of undesignable structures according to the minimum free energy (MFE) criterion under the Turner model. In this paper, we address this gap by first introducing mathematical theorems outlining sufficient conditions for recognizing undesignable structures, then proposing efficient algorithms, guided by these theorems, to verify the undesignability of RNA structures. Through the application of these theorems and algorithms to the Eterna100 puzzles, we demonstrate the ability to efficiently establish that 15 of the puzzles indeed fall within the category of undesignable structures. In addition, we provide specific insights from the study of undesignability, in the hope that it will enable more understanding of RNA folding and RNA design.

4,609

02 Jun 2025

agents chain-of-thought computer-science

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

University of Washington

Google DeepMind

Meta Pennsylvania State University

Nanyang Technological University

Duke University Oregon State University

This research introduces and defines automated failure attribution for Large Language Model (LLM) multi-agent systems, providing a method to automatically identify which agent and at what step a task failure occurred. The study evaluates three LLM-based attribution methods on a newly created, human-annotated dataset called "Who&When," finding that different methods excel at agent-level versus step-level accuracy and that performance degrades with longer conversation logs.

313

511

30 Jan 2025

computer-science artificial-intelligence computers-and-society

International AI Safety Report

Stephen Casper

A landmark international scientific assessment, led by Prof. Yoshua Bengio and 96 global AI experts, provides the first comprehensive evidence-based framework for understanding advanced AI safety risks and mitigation strategies, establishing crucial scientific foundations for international policy while highlighting urgent challenges requiring proactive governance.

5,229

18 Mar 2025

computer-science computer-vision-and-pattern-recognition explainable-ai

Where do Large Vision-Language Models Look at when Answering Questions?

Northwestern University Oregon State University Bytedance, Intelligent Creation

Researchers from Bytedance and partner universities develop a visualization framework for analyzing where Large Vision-Language Models (LVLMs) focus their attention when answering questions about images, revealing that different vision architectures lead to distinct attention patterns and that model performance doesn't always align with visual understanding behavior.

2,320

31 Jan 2025

computer-science artificial-intelligence computation-and-language

Efficient Reasoning with Hidden Thinking

Northeastern University Oregon State University

Adobe

A new framework called Heima allows Multimodal Large Language Models (MLLMs) to perform complex reasoning efficiently by compressing verbose Chain-of-Thought (CoT) steps into single "thinking tokens." This approach reduces token generation to as low as 6% of original counts while maintaining or improving zero-shot accuracy on diverse benchmarks and enabling interpretation of the hidden reasoning process through a dedicated decoder.

320

28 Mar 2019

adversarial-robustness computer-science computer-vision-security

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

UC Berkeley Oregon State University

Thomas Dietterich

Researchers at UC Berkeley and Oregon State University introduced ImageNet-C and ImageNet-P, comprehensive benchmarks for evaluating neural network robustness to common image corruptions and perturbations. Their evaluations demonstrated that modern deep learning models, despite high clean accuracy, often exhibit similar or worse relative robustness to these natural degradations compared to older architectures, alongside significant prediction instability.

447

01 May 2020

autonomous-vehicles computer-science computation-and-language

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

Georgia Institute of Technology Oregon State University Facebook AI Research

The paper introduces Vision-and-Language Navigation in Continuous Environments (VLN-CE), a more realistic benchmark where agents execute low-level actions in 3D environments, moving beyond prior "nav-graph" assumptions. This setting significantly challenges existing models, with the best cross-modal attention agent achieving a 0.30 SPL (32% SR) on val-unseen, revealing a substantial performance gap compared to nav-graph-based methods.

19 Sep 2025

computer-science robotics

Multi-Quadruped Cooperative Object Transport: Learning Decentralized Pinch-Lift-Move

Oregon State University

Researchers at Oregon State University developed `decPLM`, a hierarchical reinforcement learning framework enabling teams of quadrupedal robots to cooperatively transport ungraspable objects through physical contact, without communication or rigid coupling. The approach, leveraging a novel constellation reward, demonstrates effective generalization from two-robot training to teams of up to ten robots, reducing velocity errors by 80% and achieving drop rates below 1% for diverse payloads.

170

06 Mar 2023

computer-science machine-learning optimization-methods

Natural Gradient Methods: Perspectives, Efficient-Scalable Approximations, and Analysis

Oregon State University

Natural Gradient Descent, a second-degree optimization method motivated by the information geometry, makes use of the Fisher Information Matrix instead of the Hessian which is typically used. However, in many cases, the Fisher Information Matrix is equivalent to the Generalized Gauss-Newton Method, that both approximate the Hessian. It is an appealing method to be used as an alternative to stochastic gradient descent, potentially leading to faster convergence. However, being a second-order method makes it infeasible to be used directly in problems with a huge number of parameters and data. This is evident from the community of deep learning sticking with the stochastic gradient descent method since the beginning. In this paper, we look at the different perspectives on the natural gradient method, study the current developments on its efficient-scalable empirical approximations, and finally examine their performance with extensive experiments.

341

03 Feb 2025

computer-science computer-vision-and-pattern-recognition

Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

Boston University IBM Research

Arizona State University

Virginia Tech University of Iceland Forschungszentrum Jülich Clark University Oregon State University NASA Marshall Space Flight Center The University of Alabama in Huntsville Universities Space Research Association

This technical report presents Prithvi-EO-2.0, a new geospatial foundation model that offers significant improvements over its predecessor, Prithvi-EO-1.0. Trained on 4.2M global time series samples from NASA's Harmonized Landsat and Sentinel-2 data archive at 30m resolution, the new 300M and 600M parameter models incorporate temporal and location embeddings for enhanced performance across various geospatial tasks. Through extensive benchmarking with GEO-Bench, the 600M version outperforms the previous Prithvi-EO model by 8\% across a range of tasks. It also outperforms six other geospatial foundation models when benchmarked on remote sensing tasks from different domains and resolutions (i.e. from 0.1m to 15m). The results demonstrate the versatility of the model in both classical earth observation and high-resolution applications. Early involvement of end-users and subject matter experts (SMEs) are among the key factors that contributed to the project's success. In particular, SME involvement allowed for constant feedback on model and dataset design, as well as successful customization for diverse SME-led applications in disaster response, land use and crop mapping, and ecosystem dynamics monitoring. Prithvi-EO-2.0 is available on Hugging Face and IBM terratorch, with additional resources on GitHub. The project exemplifies the Trusted Open Science approach embraced by all involved organizations.

516

04 Apr 2025

computer-science machine-learning deep-reinforcement-learning

Machine Learning for Synthetic Data Generation: A Review

Virginia Tech Oregon State University Fordham University

A comprehensive review surveys machine learning methods for generating synthetic data, exploring diverse applications, the capabilities of deep generative models, and critical ethical considerations. The paper outlines current challenges and future opportunities in creating high-quality, privacy-preserving synthetic datasets across various fields.

378

23 Jun 2025

computer-science computer-vision-and-pattern-recognition few-shot-learning

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

University of Michigan

University of Virginia Oregon State University UNC-Chapel Hill

Adobe

4D-LRM introduces a Transformer-based Large Reconstruction Model that leverages 4D Gaussian Splatting to reconstruct dynamic objects from sparse, unconstrained input views and timestamps into full space-time representations. The model reconstructs a 24-frame dynamic object in under 1.5 seconds on an A100 GPU, achieving high fidelity and generalizing to novel view-time combinations.

858

01 Aug 2025

computer-science computer-vision-and-pattern-recognition

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Oregon State University

Adobe

We propose Long-LRM, a feed-forward 3D Gaussian reconstruction model for instant, high-resolution, 360° wide-coverage, scene-level reconstruction. Specifically, it takes in 32 input images at a resolution of 960x540 and produces the Gaussian reconstruction in just 1 second on a single A100 GPU. To handle the long sequence of 250K tokens brought by the large input size, Long-LRM features a mixture of the recent Mamba2 blocks and the classical transformer blocks, enhanced by a light-weight token merging module and Gaussian pruning steps that balance between quality and efficiency. We evaluate Long-LRM on the large-scale DL3DV benchmark and Tanks&Temples, demonstrating reconstruction quality comparable to the optimization-based methods while achieving an 800x speedup w.r.t. the optimization-based approaches and an input size at least 60x larger than the previous feed-forward approaches. We conduct extensive ablation studies on our model design choices for both rendering quality and computation efficiency. We also explore Long-LRM's compatibility with other Gaussian variants such as 2D GS, which enhances Long-LRM's ability in geometry reconstruction. Project page: this https URL

193

255

12 Jun 2025

adversarial-attacks ai-for-cybersecurity computer-science

SoK: Watermarking for AI-Generated Content

ETH Zurich

Google DeepMind

Carnegie Mellon University

Cornell University

University of California, San Diego

Columbia University University of Wisconsin–Madison Oregon State University

This Systematization of Knowledge (SoK) comprehensively surveys watermarking techniques for AI-generated content, formalizing key properties, analyzing trade-offs, and reviewing state-of-the-art methods across modalities. It aims to guide future research and inform policy development for AI safety and transparency.

470

14 Nov 2024

adversarial-robustness computer-science conversational-ai

AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks

Pennsylvania State University Oregon State University CISPA – Helmholtz Center for Information Security

This research introduces AutoDefense, a multi-agent system designed to defend Large Language Models against jailbreak attacks by filtering harmful content from generated responses. The framework achieved a reduction in Attack Success Rate on GPT-3.5 from 55.74% to 7.95% while maintaining a low false positive rate on legitimate requests.

03 Dec 2025

computer-science computer-vision-and-pattern-recognition lightweight-models

GS4: Generalizable Sparse Splatting Semantic SLAM

Oregon State University

Traditional SLAM algorithms excel at camera tracking, but typically produce incomplete and low-resolution maps that are not tightly integrated with semantics prediction. Recent work integrates Gaussian Splatting (GS) into SLAM to enable dense, photorealistic 3D mapping, yet existing GS-based SLAM methods require per-scene optimization that is slow and consumes an excessive number of Gaussians. We present GS4, the first generalizable GS-based semantic SLAM system. Compared with prior approaches, GS4 runs 10x faster, uses 10x fewer Gaussians, and achieves state-of-the-art performance across color, depth, semantic mapping and camera tracking. From an RGB-D video stream, GS4 incrementally builds and updates a set of 3D Gaussians using a feed-forward network. First, the Gaussian Prediction Model estimates a sparse set of Gaussian parameters from input frame, which integrates both color and semantic prediction with the same backbone. Then, the Gaussian Refinement Network merges new Gaussians with the existing set while avoiding redundancy. Finally, when significant pose changes are detected, we perform only 1-5 iterations of joint Gaussian-pose optimization to correct drift, remove floaters, and further improve tracking accuracy. Experiments on the real-world ScanNet and ScanNet++ benchmarks demonstrate state-of-the-art semantic SLAM performance, with strong generalization capability shown through zero-shot transfer to the NYUv2 and TUM RGB-D datasets.

221

23 May 2024

computer-science conversational-ai artificial-intelligence

Embodied LLM Agents Learn to Cooperate in Organized Teams

Tsinghua University

Princeton University Oregon State University Penn State University

The paper demonstrates that imposing prompt-based organizational structures significantly improves the cooperative efficiency and communication of multi-LLM-agent teams in embodied tasks. It also introduces a "Criticize-Reflect" framework, enabling LLM agents to autonomously optimize these structures, leading to reduced task completion times and communication overhead.

173

20 Jan 2020

autonomous-vehicles computer-science computer-vision-security

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

Georgia Institute of Technology Simon Fraser University Oregon State University Facebook AI Research

We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling -- achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially solves the task --near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks -- the analog of ImageNet pre-training + task-specific fine-tuning for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models and code are publicly available).

2,616

15 Oct 2025

computer-science computers-and-society

International AI Safety Report 2025: First Key Update: Capabilities and Risk Implications

Since the publication of the first International AI Safety Report, AI capabilities have continued to improve across key domains. New training techniques that teach AI systems to reason step-by-step and inference-time enhancements have primarily driven these advances, rather than simply training larger models. As a result, general-purpose AI systems can solve more complex problems in a range of domains, from scientific research to software development. Their performance on benchmarks that measure performance in coding, mathematics, and answering expert-level science questions has continued to improve, though reliability challenges persist, with systems excelling on some tasks while failing completely on others. These capability improvements also have implications for multiple risks, including risks from biological weapons and cyber attacks. Finally, they pose new challenges for monitoring and controllability. This update examines how AI capabilities have improved since the first Report, then focuses on key risk areas where substantial new evidence warrants updated assessments.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Undesignable RNA Structure Identification via Rival Structure Generation and Structure Decomposition

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

International AI Safety Report

Where do Large Vision-Language Models Look at when Answering Questions?

Efficient Reasoning with Hidden Thinking

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

Multi-Quadruped Cooperative Object Transport: Learning Decentralized Pinch-Lift-Move

Natural Gradient Methods: Perspectives, Efficient-Scalable Approximations, and Analysis

Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

Machine Learning for Synthetic Data Generation: A Review

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

SoK: Watermarking for AI-Generated Content

AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks

GS4: Generalizable Sparse Splatting Semantic SLAM

Embodied LLM Agents Learn to Cooperate in Organized Teams

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

International AI Safety Report 2025: First Key Update: Capabilities and Risk Implications

Events

AI for Law

Personalize Your Feed