alphaXiv

History

Papers Benchmarks

Istituto Italiano di Tecnologia

2,043

11 Oct 2024

computer-science artificial-intelligence robotics

Mastering Contact-rich Tasks by Combining Soft and Rigid Robotics with Imitation Learning

Delft University of Technology Istituto Italiano di Tecnologia German Aerospace Center (DLR)

Researchers from TU Delft, IIT, and DLR developed a hybrid robotic system integrating a rigid manipulator with a soft, octopus-inspired arm, demonstrating its ability to learn and generalize complex contact-rich tasks from single human demonstrations. This platform successfully performs delicate manipulations, navigates narrow openings, and utilizes unconventional grasping strategies with robustness and adaptability.

5,197

14 May 2025

computer-science robotics

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Google DeepMind

University of Illinois at Urbana-Champaign University of Freiburg

Carnegie Mellon University

Imperial College London

University of Southern California

New York University

Shanghai Jiao Tong University

the University of Tokyo

Stanford University

The University of Texas at Austin University of Technology Nuremberg

ETH Zürich

University of California, San Diego

RIKEN

Google Research

Columbia University

Arizona State University urich German Aerospace Center Istituto Italiano di Tecnologia Max Planck Institute Queensland University of Technology at Darmstadt Korea Advanced Institute of Science & Technology Intrinsic LLC Flexiv Robotics Technische Universit

Anikait Singh

Haochen Shi

The OpenX-Embodiment Collaboration released the Open X-Embodiment (OXE) Dataset, a consolidated collection of over 1 million real robot trajectories from 22 embodiments. This work demonstrates that large RT-X models trained on such diverse data achieve positive transfer and emergent skills across different robot platforms.

226

932

20 Aug 2025

agent-based-systems computer-science artificial-intelligence

aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

University of Toronto Max Planck Institute for Intelligent Systems University of Utah

UCLA

University of Manchester

National University of Singapore

University of Oxford

Tsinghua University

Zhejiang University

The Chinese University of Hong Kong

Westlake University University of Electronic Science and Technology of China

University of California, San Diego

Peking University

Columbia University

University of Sydney Universit`a degli Studi di Genova Istituto Italiano di Tecnologia University of Birmingham

Researchers at the University of Toronto, Westlake University, and the University of Electronic Science and Technology of China, along with a global consortium, developed aiXiv, an open-access ecosystem designed for AI-generated scientific content and human-AI collaboration. This platform, featuring a multi-agent review system and iterative refinement, raised the acceptance rate of AI-generated proposals from 0% to 45.2% and papers from 10% to 70% in multi-AI voting, demonstrating enhanced quality and trustworthiness.

1,744

22 Oct 2025

computer-science computation-and-language formal-languages-and-automata-theory

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

University of Freiburg Istituto Italiano di Tecnologia AI Centre, University College London

Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or mLSTM, yield fast runtime but have limited expressivity. To address this, recent architectures such as DeltaNet and RWKV-7 adopted a diagonal plus rank--1 structure, which allows simultaneous token and channel mixing, improving associative recall and, as recently shown, state-tracking when allowing state-transition matrices to have negative eigenvalues. Building on the interpretation of DeltaNet's recurrence as performing one step of online gradient descent per token on an associative recall loss, we introduce DeltaProduct, which instead takes multiple (

n_h

) steps per token. This naturally leads to diagonal plus rank--

n_h

state-transition matrices, formed as products of

n_h

generalized Householder transformations, providing a tunable mechanism to balance expressivity and efficiency. We provide a detailed theoretical characterization of the state-tracking capability of DeltaProduct in finite precision, showing how it improves by increasing

n_h

. Our extensive experiments demonstrate that DeltaProduct outperforms DeltaNet in both state-tracking and language modeling, while also showing significantly improved length extrapolation capabilities.

323

24 Nov 2025

bayesian-optimization computer-science machine-learning

Hyperparameter Optimization in Machine Learning

Leipzig University

University College London Istituto Italiano di Tecnologia Amazon Web Services Università di Firenze Helsing

This monograph by Franceschi et al. provides a comprehensive, unified treatment of hyperparameter optimization (HPO) in machine learning, systematically categorizing diverse algorithms and outlining their evolution and practical considerations. It serves as a foundational resource, integrating HPO with advanced ML paradigms and identifying future research directions, particularly concerning foundation models.

18 Nov 2025

agents computer-science artificial-intelligence

Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

The Chinese University of Hong Kong Istituto Italiano di Tecnologia Hong Kong Center for Logistics Robotics

VLA-Pilot, an inference-time policy steering method, enables zero-shot deployment of pre-trained Vision-Language-Action (VLA) models by leveraging Multimodal Large Language Models (MLLMs) for open-world objective reasoning and an evolutionary diffusion process for action optimization. The approach by researchers from The Chinese University of Hong Kong and Istituto Italiano di Tecnologia boosts manipulation success rates by an average of 30-31% and demonstrates robust generalization across diverse tasks and robot embodiments, matching or exceeding fine-tuning performance.

179

16 Jun 2025

agent-based-systems ai-for-health computer-science

A Survey on Imitation Learning for Contact-Rich Tasks in Robotics

Google DeepMind Jožef Stefan Institute

University of Tokyo Università di Genova Istituto Italiano di Tecnologia Saitama University

A survey systematically reviews imitation learning (IL) research for contact-rich robotic tasks, detailing demonstration collection, learning algorithms, and real-world applications. It highlights the growing role of multimodal data and foundation models in advancing robotic capabilities for complex physical interactions, while also identifying key challenges and future directions in the field.

07 Oct 2025

bayesian-deep-learning computer-science machine-learning

Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime

University College London University of Genoa Istituto Italiano di Tecnologia

Researchers from IIT, University of Genoa, and UCL developed high-probability, data-dependent generalization bounds for Gibbs posterior and Langevin Monte Carlo algorithms that remain valid in the overparameterized, low-temperature interpolation regime. The approach successfully differentiates between true generalization on real data and memorization of random labels, achieving non-trivial and tight upper bounds on test error.

253

30 Oct 2024

computer-science machine-learning mathematics

Operator World Models for Reinforcement Learning

University College London Istituto Italiano di Tecnologia

Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision-making. However, it is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions. We address this challenge by introducing a novel approach based on learning a world model of the environment using conditional mean embeddings. Leveraging tools from operator theory we derive a closed-form expression of the action-value function in terms of the world model via simple matrix operations. Combining these estimators with PMD leads to POWR, a new RL algorithm for which we prove convergence rates to the global optimum. Preliminary experiments in finite and infinite state settings support the effectiveness of our method

109

12 Nov 2025

computer-science robotics

Primal-Dual iLQR for GPU-Accelerated Learning and Control in Legged Robots

Polytechnique Montréal Università di Genova Istituto Italiano di Tecnologia

This paper introduces a novel Model Predictive Control (MPC) implementation for legged robot locomotion that leverages GPU parallelization. Our approach enables both temporal and state-space parallelization by incorporating a parallel associative scan to solve the primal-dual Karush-Kuhn-Tucker (KKT) system. In this way, the optimal control problem is solved in

\mathcal{O}(n\log{N} + m)

complexity, instead of

\mathcal{O}(N(n + m)^3)

, where

n

m

, and

N

are the dimension of the system state, control vector, and the length of the prediction horizon. We demonstrate the advantages of this implementation over two state-of-the-art solvers (acados and crocoddyl), achieving up to a 60\% improvement in runtime for Whole Body Dynamics (WB)-MPC and a 700\% improvement for Single Rigid Body Dynamics (SRBD)-MPC when varying the prediction horizon length. The presented formulation scales efficiently with the problem state dimensions as well, enabling the definition of a centralized controller for up to 16 legged robots that can be computed in less than 25 ms. Furthermore, thanks to the JAX implementation, the solver supports large-scale parallelization across multiple environments, allowing the possibility of performing learning with the MPC in the loop directly in GPU.

143

134

02 Jul 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

PanTS: The Pancreatic Tumor Segmentation Dataset

University of Illinois at Urbana-Champaign

UC Berkeley

NVIDIA

Johns Hopkins University University of Bologna Johns Hopkins School of Medicine Istituto Italiano di Tecnologia Jagiellonian University University of California San Francisco Peking University Third Hospital University of Warmia and Mazury Diagnostic and Treatment Center Gammed

PanTS is a large-scale, multi-institutional dataset curated to advance research in pancreatic CT analysis. It contains 36,390 CT scans from 145 medical centers, with expert-validated, voxel-wise annotations of over 993,000 anatomical structures, covering pancreatic tumors, pancreas head, body, and tail, and 24 surrounding anatomical structures such as vascular/skeletal structures and abdominal/thoracic organs. Each scan includes metadata such as patient age, sex, diagnosis, contrast phase, in-plane spacing, slice thickness, etc. AI models trained on PanTS achieve significantly better performance in pancreatic tumor detection, localization, and segmentation compared to those trained on existing public datasets. Our analysis indicates that these gains are directly attributable to the 16x larger-scale tumor annotations and indirectly supported by the 24 additional surrounding anatomical structures. As the largest and most comprehensive resource of its kind, PanTS offers a new benchmark for developing and evaluating AI models in pancreatic CT analysis.

106

13 Nov 2025

agents computer-science artificial-intelligence

Feedback-MPPI: Fast Sampling-Based MPC via Rollout Differentiation -- Adios low-level controllers

University College London IRISA Istituto Italiano di Tecnologia

Model Predictive Path Integral control is a powerful sampling-based approach suitable for complex robotic tasks due to its flexibility in handling nonlinear dynamics and non-convex costs. However, its applicability in real-time, highfrequency robotic control scenarios is limited by computational demands. This paper introduces Feedback-MPPI (F-MPPI), a novel framework that augments standard MPPI by computing local linear feedback gains derived from sensitivity analysis inspired by Riccati-based feedback used in gradient-based MPC. These gains allow for rapid closed-loop corrections around the current state without requiring full re-optimization at each timestep. We demonstrate the effectiveness of F-MPPI through simulations and real-world experiments on two robotic platforms: a quadrupedal robot performing dynamic locomotion on uneven terrain and a quadrotor executing aggressive maneuvers with onboard computation. Results illustrate that incorporating local feedback significantly improves control performance and stability, enabling robust, high-frequency operation suitable for complex robotic systems.

615

18 Mar 2025

computer-science computation-and-language formal-languages-and-automata-theory

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

University of Freiburg

University College London Istituto Italiano di Tecnologia ubingen ELLIS Institute T

This research from IIT, University of Freiburg, ELLIS Institute Tübingen, and UCL demonstrates that allowing negative eigenvalues in Linear Recurrent Neural Networks (LRNNs) fundamentally unlocks their ability to perform state-tracking tasks. The study provides theoretical proofs that this modification enables LRNNs to recognize any regular language and empirically shows perfect performance on parity and improved perplexity on code and math language modeling datasets.

206

29 Sep 2024

computer-science robotics

RoboNurse-VLA: Robotic Scrub Nurse System based on Vision-Language-Action Model

The Chinese University of Hong Kong Istituto Italiano di Tecnologia Multi-Scale Medical Robotics Centre, Ltd.

In modern healthcare, the demand for autonomous robotic assistants has grown significantly, particularly in the operating room, where surgical tasks require precision and reliability. Robotic scrub nurses have emerged as a promising solution to improve efficiency and reduce human error during surgery. However, challenges remain in terms of accurately grasping and handing over surgical instruments, especially when dealing with complex or difficult objects in dynamic environments. In this work, we introduce a novel robotic scrub nurse system, RoboNurse-VLA, built on a Vision-Language-Action (VLA) model by integrating the Segment Anything Model 2 (SAM 2) and the Llama 2 language model. The proposed RoboNurse-VLA system enables highly precise grasping and handover of surgical instruments in real-time based on voice commands from the surgeon. Leveraging state-of-the-art vision and language models, the system can address key challenges for object detection, pose optimization, and the handling of complex and difficult-to-grasp instruments. Through extensive evaluations, RoboNurse-VLA demonstrates superior performance compared to existing models, achieving high success rates in surgical instrument handovers, even with unseen tools and challenging items. This work presents a significant step forward in autonomous surgical assistance, showcasing the potential of integrating VLA models for real-world medical applications. More details can be found at this https URL.

16 Oct 2025

ai-for-health computer-science artificial-intelligence

Scaling Artificial Intelligence for Multi-Tumor Early Detection with More Reports, Fewer Masks

University of Zurich

UC Berkeley

Stanford University

Johns Hopkins University University of Bologna Istituto Italiano di Tecnologia Jagiellonian University École Polytechnique Fédérale de Lausanne

University of California, Santa Cruz University of California San Francisco ETH AI Center Istanbul Medipol University University of Warmia and Mazury Warmian-Masurian Cancer Center

Early tumor detection save lives. Each year, more than 300 million computed tomography (CT) scans are performed worldwide, offering a vast opportunity for effective cancer screening. However, detecting small or early-stage tumors on these CT scans remains challenging, even for experts. Artificial intelligence (AI) models can assist by highlighting suspicious regions, but training such models typically requires extensive tumor masks--detailed, voxel-wise outlines of tumors manually drawn by radiologists. Drawing these masks is costly, requiring years of effort and millions of dollars. In contrast, nearly every CT scan in clinical practice is already accompanied by medical reports describing the tumor's size, number, appearance, and sometimes, pathology results--information that is rich, abundant, and often underutilized for AI training. We introduce R-Super, which trains AI to segment tumors that match their descriptions in medical reports. This approach scales AI training with large collections of readily available medical reports, substantially reducing the need for manually drawn tumor masks. When trained on 101,654 reports, AI models achieved performance comparable to those trained on 723 masks. Combining reports and masks further improved sensitivity by +13% and specificity by +8%, surpassing radiologists in detecting five of the seven tumor types. Notably, R-Super enabled segmentation of tumors in the spleen, gallbladder, prostate, bladder, uterus, and esophagus, for which no public masks or AI models previously existed. This study challenges the long-held belief that large-scale, labor-intensive tumor mask creation is indispensable, establishing a scalable and accessible path toward early detection across diverse tumor types. We plan to release our trained models, code, and dataset at this https URL

19 Sep 2025

computer-science robotics

A Framework for Optimal Ankle Design of Humanoid Robots

University of Manchester Istituto Italiano di Tecnologia Istituto Italiano di Tecnologia (IIT)

The design of the humanoid ankle is critical for safe and efficient ground interaction. Key factors such as mechanical compliance and motor mass distribution have driven the adoption of parallel mechanism architectures. However, selecting the optimal configuration depends on both actuator availability and task requirements. We propose a unified methodology for the design and evaluation of parallel ankle mechanisms. A multi-objective optimization synthesizes the mechanism geometry, the resulting solutions are evaluated using a scalar cost function that aggregates key performance metrics for cross-architecture comparison. We focus on two representative architectures: the Spherical-Prismatic-Universal (SPU) and the Revolute-Spherical-Universal (RSU). For both, we resolve the kinematics, and for the RSU, introduce a parameterization that ensures workspace feasibility and accelerates optimization. We validate our approach by redesigning the ankle of an existing humanoid robot. The optimized RSU consistently outperforms both the original serial design and a conventionally engineered RSU, reducing the cost function by up to 41% and 14%, respectively.

13 Oct 2025

computer-science computer-vision-and-pattern-recognition robotics

Visual Affordance Prediction: Survey and Reproducibility

Idiap Research Institute

Queen Mary University of London Ecole Polytechnique Fédérale de Lausanne Istituto Italiano di Tecnologia

Researchers from IIT, Queen Mary University of London, and Idiap/EPFL developed a unified formulation for visual affordance prediction and introduced the "Affordance Sheet" to promote reporting standards. This work systematically reviews existing methodologies and datasets, critically identifying pervasive reproducibility challenges across the field.

01 Oct 2025

attention-mechanisms computer-science artificial-intelligence

CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation

Istituto Italiano di Tecnologia

Learning robotic manipulation policies through supervised learning from demonstrations remains challenging when policies encounter execution variations not explicitly covered during training. While incorporating historical context through attention mechanisms can improve robustness, standard approaches process all past states in a sequence without explicitly modeling the temporal structure that demonstrations may include, such as failure and recovery patterns. We propose a Cross-State Transition Attention Transformer that employs a novel State Transition Attention (STA) mechanism to modulate standard attention weights based on learned state evolution patterns, enabling policies to better adapt their behavior based on execution history. Our approach combines this structured attention with temporal masking during training, where visual information is randomly removed from recent timesteps to encourage temporal reasoning from historical context. Evaluation in simulation shows that STA consistently outperforms standard cross-attention and temporal modeling approaches like TCN and LSTM networks across all tasks, achieving more than 2x improvement over cross-attention on precision-critical tasks.

02 Oct 2025

computer-science machine-learning optimization-methods

Learning Multi-Index Models with Hyper-Kernel Ridge Regression

MIT Istituto Italiano di Tecnologia Universită di Genova

Deep neural networks excel in high-dimensional problems, outperforming models such as kernel methods, which suffer from the curse of dimensionality. However, the theoretical foundations of this success remain poorly understood. We follow the idea that the compositional structure of the learning task is the key factor determining when deep networks outperform other approaches. Taking a step towards formalizing this idea, we consider a simple compositional model, namely the multi-index model (MIM). In this context, we introduce and study hyper-kernel ridge regression (HKRR), an approach blending neural networks and kernel methods. Our main contribution is a sample complexity result demonstrating that HKRR can adaptively learn MIM, overcoming the curse of dimensionality. Further, we exploit the kernel nature of the estimator to develop ad hoc optimization approaches. Indeed, we contrast alternating minimization and alternating gradient methods both theoretically and numerically. These numerical results complement and reinforce our theoretical findings.

1,057

03 Apr 2025

agent-based-systems agents autonomous-vehicles

Scaling Laws in Scientific Discovery with AI and Robot Scientists

University of Toronto

Harvard University

Tsinghua University

Zhejiang University

Rutgers University Georgia Tech Istituto Italiano di Tecnologia University College of London Universită di Genova

Scientific discovery is poised for rapid advancement through advanced robotics and artificial intelligence. Current scientific practices face substantial limitations as manual experimentation remains time-consuming and resource-intensive, while multidisciplinary research demands knowledge integration beyond individual researchers' expertise boundaries. Here, we envision an autonomous generalist scientist (AGS) concept combines agentic AI and embodied robotics to automate the entire research lifecycle. This system could dynamically interact with both physical and virtual environments while facilitating the integration of knowledge across diverse scientific disciplines. By deploying these technologies throughout every research stage -- spanning literature review, hypothesis generation, experimentation, and manuscript writing -- and incorporating internal reflection alongside external feedback, this system aims to significantly reduce the time and resources needed for scientific discovery. Building on the evolution from virtual AI scientists to versatile generalist AI-based robot scientists, AGS promises groundbreaking potential. As these autonomous systems become increasingly integrated into the research process, we hypothesize that scientific discovery might adhere to new scaling laws, potentially shaped by the number and capabilities of these autonomous systems, offering novel perspectives on how knowledge is generated and evolves. The adaptability of embodied robots to extreme environments, paired with the flywheel effect of accumulating scientific knowledge, holds the promise of continually pushing beyond both physical and intellectual frontiers.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Mastering Contact-rich Tasks by Combining Soft and Rigid Robotics with Imitation Learning

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

Hyperparameter Optimization in Machine Learning

Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

A Survey on Imitation Learning for Contact-Rich Tasks in Robotics

Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime

Operator World Models for Reinforcement Learning

Primal-Dual iLQR for GPU-Accelerated Learning and Control in Legged Robots

PanTS: The Pancreatic Tumor Segmentation Dataset

Feedback-MPPI: Fast Sampling-Based MPC via Rollout Differentiation -- Adios low-level controllers

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

RoboNurse-VLA: Robotic Scrub Nurse System based on Vision-Language-Action Model

Scaling Artificial Intelligence for Multi-Tumor Early Detection with More Reports, Fewer Masks

A Framework for Optimal Ankle Design of Humanoid Robots

Visual Affordance Prediction: Survey and Reproducibility

CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation

Learning Multi-Index Models with Hyper-Kernel Ridge Regression

Scaling Laws in Scientific Discovery with AI and Robot Scientists

Events

AI for Law

Personalize Your Feed