alphaXiv

History

Papers Benchmarks

Robotics at Google

2,406

07 Mar 2023

computer-science artificial-intelligence machine-learning

PaLM-E: An Embodied Multimodal Language Model

Google Research TU Berlin Robotics at Google

PaLM-E directly incorporates real-world sensor data, such as images and state estimates, into a large language model framework to enable embodied AI systems capable of understanding and interacting with their physical environment through natural language, demonstrating high success rates in robotic manipulation and state-of-the-art performance in visual question answering.

285

861

14 Jun 2021

computer-science artificial-intelligence machine-learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

University of Southern California

UC Berkeley

Stanford University

Columbia University Robotics at Google

Meta-World, an open-source benchmark featuring 50 diverse robotic manipulation tasks, provides a platform for evaluating multi-task and meta-reinforcement learning. Its evaluation shows current algorithms struggle to generalize effectively to new, distinct tasks, highlighting challenges in broad skill acquisition for robots.

1,593

1,423

26 Jul 2022

autonomous-vehicles computer-science artificial-intelligence

LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

UC Berkeley University of Warsaw Robotics at Google

LM-Nav enables robots to follow natural language instructions in complex real-world outdoor environments by composing independently pre-trained large language, vision-language, and visual navigation models. The system successfully navigated a Clearpath Jackal UGV over 6 km, demonstrating an 80% success rate in executing diverse human commands without requiring task-specific fine-tuning or language-annotated robot data.

296

04 Dec 2023

computer-science artificial-intelligence robotics

RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning

Google DeepMind

UC Berkeley

Stanford University Simon Fraser University Robotics at Google

Replicating human-like dexterity in robot hands represents one of the largest open problems in robotics. Reinforcement learning is a promising approach that has achieved impressive progress in the last few years; however, the class of problems it has typically addressed corresponds to a rather narrow definition of dexterity as compared to human capabilities. To address this gap, we investigate piano-playing, a skill that challenges even the human limits of dexterity, as a means to test high-dimensional control, and which requires high spatial and temporal precision, and complex finger coordination and planning. We introduce RoboPianist, a system that enables simulated anthropomorphic hands to learn an extensive repertoire of 150 piano pieces where traditional model-based optimization struggles. We additionally introduce an open-sourced environment, benchmark of tasks, interpretable evaluation metrics, and open challenges for future study. Our website featuring videos, code, and datasets is available at this https URL

182

13 Dec 2021

computer-science computer-vision-security artificial-intelligence

XIRL: Cross-embodiment Inverse Reinforcement Learning

Stanford University Robotics at Google

XIRL proposes a self-supervised, label-free framework that learns a vision-based reward function from unlabeled video demonstrations to enable robots to learn tasks despite stark embodiment differences. It leverages Temporal Cycle-Consistency to discover an embodiment-invariant notion of task progress, facilitating more efficient reinforcement learning without requiring human-labeled frame correspondences.

30 Aug 2020

autonomous-vehicles computer-science computer-vision-security

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

Allen Institute for Artificial Intelligence

Georgia Institute of Technology Simon Fraser University Facebook AI Research Robotics at Google

A multi-institutional working group provides consensus recommendations to standardize Object-Goal Navigation (ObjectNav), addressing inconsistencies in task definition and evaluation. The paper precisely defines success criteria, agent embodiment parameters, and environment characteristics, culminating in concrete instantiations for the Habitat and RoboTHOR 2020 challenges.

31 Oct 2021

computer-science artificial-intelligence machine-learning

Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation

Stanford University Robotics at Google

We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popular form of task specification, as they are already grounded in the robot's observation space. However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks. Natural language provides a convenient and flexible alternative for task specification, but comes with the challenge of grounding language in the robot's observation space. To scalably learn this grounding we propose to leverage offline robot datasets (including highly sub-optimal, autonomously collected data) with crowd-sourced natural language labels. With this data, we learn a simple classifier which predicts if a change in state completes a language instruction. This provides a language-conditioned reward function that can then be used for offline multi-task RL. In our experiments, we find that on language-conditioned manipulation tasks our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%, and is able to perform visuomotor tasks from natural language, such as "open the right drawer" and "move the stapler", on a Franka Emika Panda robot.

11 Dec 2023

computer-science artificial-intelligence computation-and-language

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

UC Berkeley

Stanford University

Google Research Robotics at Google

Grounded Decoding introduces a method to guide large language models in generating physically realizable plans for embodied agents by integrating token-level feedback from diverse grounded models. This approach enables robots to understand and execute complex, open-ended natural language instructions more effectively and efficiently in physical environments.

30 Jul 2020

computer-science machine-learning neural-and-evolutionary-computing

Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

Columbia University Robotics at Google

Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. In contrast to gradient-based meta-learning algorithms that rely on second-order gradient estimation, we introduce a more noise-tolerant Batch Hill-Climbing adaptation operator and combine it with meta-learning based on evolutionary strategies. Our method significantly improves adaptation to changes in dynamics in high noise settings, which are common in robotics applications. We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics. We observe that our method significantly outperforms prior gradient-based approaches, enabling the robot to adapt its policy to changes based on less than 3 minutes of real data.

17 Jun 2022

computer-science artificial-intelligence machine-learning

The State of Sparse Training in Deep Reinforcement Learning

Google Research Robotics at Google Adept AI

The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. Our results corroborate the findings from sparse training in the computer vision domain - sparse networks perform better than dense networks for the same parameter count - in the DRL domain. We provide detailed analyses on how the various components in DRL are affected by the use of sparse networks and conclude by suggesting promising avenues for improving the effectiveness of sparse training methods, as well as for advancing their use in DRL.

14 Nov 2022

computer-science artificial-intelligence machine-learning

QuaRL: Quantization for Fast and Environmentally Sustainable Reinforcement Learning

Harvard University BITS Pilani, Goa Robotics at Google

Deep reinforcement learning continues to show tremendous potential in achieving task-level autonomy, however, its computational and energy demands remain prohibitively high. In this paper, we tackle this problem by applying quantization to reinforcement learning. To that end, we introduce a novel Reinforcement Learning (RL) training paradigm, \textit{ActorQ}, to speed up actor-learner distributed RL training. \textit{ActorQ} leverages 8-bit quantized actors to speed up data collection without affecting learning convergence. Our quantized distributed RL training system, \textit{ActorQ}, demonstrates end-to-end speedups \blue{between 1.5

\times

and 5.41

\times

}, and faster convergence over full precision training on a range of tasks (Deepmind Control Suite) and different RL algorithms (D4PG, DQN). Furthermore, we compare the carbon emissions (Kgs of CO2) of \textit{ActorQ} versus standard reinforcement learning \blue{algorithms} on various tasks. Across various settings, we show that \textit{ActorQ} enables more environmentally friendly reinforcement learning by achieving \blue{carbon emission improvements between 1.9

\times

and 3.76

\times

} compared to training RL-agents in full-precision. We believe that this is the first of many future works on enabling computationally energy-efficient and sustainable reinforcement learning. The source code is available here for the public to use: \url{this https URL}.

14 May 2021

computer-science computer-vision-and-pattern-recognition representation-learning

SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping

Google Research Waymo Robotics at Google

We present SMURF, a method for unsupervised learning of optical flow that improves state of the art on all benchmarks by

36\%

40\%

(over the prior best method UFlow) and even outperforms several supervised approaches such as PWC-Net and FlowNet2. Our method integrates architecture improvements from supervised optical flow, i.e. the RAFT model, with new ideas for unsupervised learning that include a sequence-aware self-supervision loss, a technique for handling out-of-frame motion, and an approach for learning effectively from multi-frame video data while still only requiring two frames for inference.

26 Jul 2020

computer-science computer-vision-and-pattern-recognition machine-learning

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

Technical University of Munich Google AI Robotics at Google

Mask2CAD, developed by researchers at Google AI and Technical University of Munich, accurately predicts 3D object shapes and poses from single RGB images by retrieving clean CAD models. The method achieves nearly double the APmesh compared to Mesh R-CNN on the Pix3D dataset, reaching 33.2%, and operates at 60 milliseconds per image while generalizing to unseen 3D models.

31 May 2021

computer-science robotics

SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning

Stanford University Robotics at Google Everyday Robots, X The Moonshot Factory

SimGAN addresses the sim-to-real gap for robot control by identifying a hybrid physics simulator whose generated trajectories are indistinguishable from real-world data. This is achieved through an adversarial reinforcement learning framework, enabling more accurate and generalizable policy transfer while reducing the need for extensive real-world data.

03 Mar 2022

computer-science robotics

Implicit Kinematic Policies: Unifying Joint and Cartesian Action Spaces in End-to-End Robot Learning

Stanford University Robotics at Google AUTOLab at UC Berkeley

Action representation is an important yet often overlooked aspect in end-to-end robot learning with deep networks. Choosing one action space over another (e.g. target joint positions, or Cartesian end-effector poses) can result in surprisingly stark performance differences between various downstream tasks -- and as a result, considerable research has been devoted to finding the right action space for a given application. However, in this work, we instead investigate how our models can discover and learn for themselves which action space to use. Leveraging recent work on implicit behavioral cloning, which takes both observations and actions as input, we demonstrate that it is possible to present the same action in multiple different spaces to the same policy -- allowing it to learn inductive patterns from each space. Specifically, we study the benefits of combining Cartesian and joint action spaces in the context of learning manipulation skills. To this end, we present Implicit Kinematic Policies (IKP), which incorporates the kinematic chain as a differentiable module within the deep network. Quantitative experiments across several simulated continuous control tasks -- from scooping piles of small objects, to lifting boxes with elbows, to precise block insertion with miscalibrated robots -- suggest IKP not only learns complex prehensile and non-prehensile manipulation from pixels better than baseline alternatives, but also can learn to compensate for small joint encoder offset errors. Finally, we also run qualitative experiments on a real UR5e to demonstrate the feasibility of our algorithm on a physical robotic system with real data. See this https URL for code and supplementary material.

08 Jun 2024

active-learning computer-science machine-learning

Safely Learning Dynamical Systems

Princeton University Robotics at Google

A fundamental challenge in learning an unknown dynamical system is to reduce model uncertainty by making measurements while maintaining safety. We formulate a mathematical definition of what it means to safely learn a dynamical system by sequentially deciding where to initialize trajectories. The state of the system must stay within a safety region for a horizon of

T

time steps under the action of all dynamical systems that (i) belong to a given initial uncertainty set, and (ii) are consistent with information gathered so far. First, we consider safely learning a linear dynamical system involving

n

states. For the case

T=1

, we present an LP-based algorithm that either safely recovers the true dynamics from at most

n

trajectories, or certifies that safe learning is impossible. For

T=2

, we give an SDP representation of the set of safe initial conditions and show that

\lceil n/2 \rceil

trajectories generically suffice for safe learning. For

T = \infty

, we provide SDP-representable inner approximations of the set of safe initial conditions and show that one trajectory generically suffices for safe learning. We extend a number of our results to the cases where the initial uncertainty set contains sparse, low-rank, or permutation matrices, or when the system has a control input. Second, we consider safely learning a general class of nonlinear dynamical systems. For the case

T=1

, we give an SOCP-based representation of the set of safe initial conditions. For

T=\infty

, we provide semidefinite representable inner approximations to the set of safe initial conditions. We show how one can safely collect trajectories and fit a polynomial model of the nonlinear dynamics that is consistent with the initial uncertainty set and best agrees with the observations. We also present some extensions to cases where the measurements are noisy or the dynamical system involves disturbances.

07 Nov 2020

computer-science computer-vision-and-pattern-recognition graphics

Unsupervised Monocular Depth Learning in Dynamic Scenes

Google Research Waymo LLC Robotics at Google

We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision. We show that this apparently heavily underdetermined problem can be regularized by imposing the following prior knowledge about 3D translation fields: they are sparse, since most of the scene is static, and they tend to be constant for rigid moving objects. We show that this regularization alone is sufficient to train monocular depth prediction models that exceed the accuracy achieved in prior work for dynamic scenes, including methods that require semantic input. Code is at this https URL .

16 Feb 2020

computer-science machine-learning robotics

Learning Fast Adaptation with Meta Strategy Optimization

Georgia Institute of Technology Robotics at Google X

The ability to walk in new scenarios is a key milestone on the path toward real-world applications of legged robots. In this work, we introduce Meta Strategy Optimization, a meta-learning algorithm for training policies with latent variable inputs that can quickly adapt to new scenarios with a handful of trials in the target environment. The key idea behind MSO is to expose the same adaptation process, Strategy Optimization (SO), to both the training and testing phases. This allows MSO to effectively learn locomotion skills as well as a latent space that is suitable for fast adaptation. We evaluate our method on a real quadruped robot and demonstrate successful adaptation in various scenarios, including sim-to-real transfer, walking with a weakened motor, or climbing up a slope. Furthermore, we quantitatively analyze the generalization capability of the trained policy in simulated environments. Both real and simulated experiments show that our method outperforms previous methods in adaptation to novel tasks.

09 Nov 2020

computer-science artificial-intelligence machine-learning

Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous

MIT Robotics at Google

Collaboration requires agents to align their goals on the fly. Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans. We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous. Starting with pretrained, single-agent point to point navigation policies and using noisy, high-dimensional sensor inputs like lidar, we first learn via self-supervision motion predictions of all agents on the team. Next, HPP uses the prediction models to propose and evaluate navigation subgoals for completing the rendezvous task without explicit communication among agents. We evaluate HPP in a suite of unseen environments, with increasing complexity and numbers of obstacles. We show that HPP outperforms alternative reinforcement learning, path planning, and heuristic-based baselines on challenging, unseen environments. Experiments in the real world demonstrate successful transfer of the prediction models from sim to real world without any additional fine-tuning. Altogether, HPP removes the need for a centralized operator in multiagent systems by combining model-based RL and inference methods, enabling agents to dynamically align plans.

15 Jan 2021

autonomous-vehicles computer-science artificial-intelligence

Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller

Harvard University

The University of Texas at Austin

Delft University of Technology Robotics at Google

We present fully autonomous source seeking onboard a highly constrained nano quadcopter, by contributing application-specific system and observation feature design to enable inference of a deep-RL policy onboard a nano quadcopter. Our deep-RL algorithm finds a high-performance solution to a challenging problem, even in presence of high noise levels and generalizes across real and simulation environments with different obstacle configurations. We verify our approach with simulation and in-field testing on a Bitcraze CrazyFlie using only the cheap and ubiquitous Cortex-M4 microcontroller unit. The results show that by end-to-end application-specific system design, our contribution consumes almost three times less additional power, as compared to competing learning-based navigation approach onboard a nano quadcopter. Thanks to our observation space, which we carefully design within the resource constraints, our solution achieves a 94% success rate in cluttered and randomized test environments, as compared to the previously achieved 80%. We also compare our strategy to a simple finite state machine (FSM), geared towards efficient exploration, and demonstrate that our policy is more robust and resilient at obstacle avoidance as well as up to 70% more efficient in source seeking. To this end, we contribute a cheap and lightweight end-to-end tiny robot learning (tinyRL) solution, running onboard a nano quadcopter, that proves to be robust and efficient in a challenging task using limited sensory input.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

PaLM-E: An Embodied Multimodal Language Model

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning

XIRL: Cross-embodiment Inverse Reinforcement Learning

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

The State of Sparse Training in Deep Reinforcement Learning

QuaRL: Quantization for Fast and Environmentally Sustainable Reinforcement Learning

SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning

Implicit Kinematic Policies: Unifying Joint and Cartesian Action Spaces in End-to-End Robot Learning

Safely Learning Dynamical Systems

Unsupervised Monocular Depth Learning in Dynamic Scenes

Learning Fast Adaptation with Meta Strategy Optimization

Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous

Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller

Events

AI for Law

Personalize Your Feed