Ask or search anything...

History

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Hot

Beijing Innovation Center of Humanoid Robotics Co. Ltd.

MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation

10 Jul 2025

Wuhan University The Hong Kong University of Science and Technology (Guangzhou)

MapNav introduces Annotated Semantic Maps (ASMs) as a memory representation for Vision-and-Language Navigation (VLN), enabling Vision-Language Models (VLMs) to interpret structured spatial information through explicit textual annotations. This approach achieves state-of-the-art performance, boosting Success Rate by 23.5% and Success weighted Path Length by 26.5% on R2R over prior methods while reducing memory consumption to a constant 0.17MB and inference time by 79.5%.

View blog

#computer-science #robotics

Resources

1,520

Omni-Perception: Omnidirectional Collision Avoidance for Legged Locomotion in Dynamic Environments

28 Aug 2025

Tsinghua University The Hong Kong University of Science and Technology (Guangzhou)

Researchers at The Hong Kong University of Science and Technology (Guangzhou) developed Omni-Perception, an end-to-end reinforcement learning framework enabling legged robots to perform omnidirectional collision avoidance by directly processing raw, spatio-temporal LiDAR data. The system achieved 70% success avoiding aerial obstacles and 90% for moving humans in real-world tests, significantly outperforming a native robot system, and incorporates a custom high-fidelity LiDAR simulator and a novel hierarchical perception network.

View blog

#computer-science #robotics

Resources 325

313

DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction

10 Oct 2025

Beijing Innovation Center of Humanoid Robotics Co. Ltd.The Hong Kong University, China

Researchers developed a system for depth-only perceptive locomotion in humanoid robots across challenging terrains, integrating realistic depth synthesis and a cross-attention transformer for terrain reconstruction. The approach enables accurate inference of occluded regions and robust, adaptive locomotion with a perception latency of approximately 20 ms in real-world deployment.

View blog

#computer-science #robotics

Resources

TopoNav: Topological Graphs as a Key Enabler for Advanced Object Navigation

01 Sep 2025

Tsinghua University The Hong Kong University of Science and Technology (Guangzhou)

TopoNav introduces a framework for object navigation that utilizes dynamic, evolving topological graphs as a spatial memory mechanism, enabling state-of-the-art zero-shot performance on HM3D and MP3D datasets. The approach enhances an agent's ability to retain and reason with environmental structure, demonstrating an SR of 0.601 on HM3D and successfully implementing the system on a quadruped robot.

View blog

#computer-science #robotics

Resources

Distillation-PPO: A Novel Two-Stage Reinforcement Learning Framework for Humanoid Robot Perceptive Locomotion

11 Mar 2025

The Hong Kong University of Science and Technology (Guangzhou)Beijing Innovation Center of Humanoid Robotics Co. Ltd.

In recent years, humanoid robots have garnered significant attention from both academia and industry due to their high adaptability to environments and human-like characteristics. With the rapid advancement of reinforcement learning, substantial progress has been made in the walking control of humanoid robots. However, existing methods still face challenges when dealing with complex environments and irregular terrains. In the field of perceptive locomotion, existing approaches are generally divided into two-stage methods and end-to-end methods. Two-stage methods first train a teacher policy in a simulated environment and then use distillation techniques, such as DAgger, to transfer the privileged information learned as latent features or actions to the student policy. End-to-end methods, on the other hand, forgo the learning of privileged information and directly learn policies from a partially observable Markov decision process (POMDP) through reinforcement learning. However, due to the lack of supervision from a teacher policy, end-to-end methods often face difficulties in training and exhibit unstable performance in real-world applications. This paper proposes an innovative two-stage perceptive locomotion framework that combines the advantages of teacher policies learned in a fully observable Markov decision process (MDP) to regularize and supervise the student policy. At the same time, it leverages the characteristics of reinforcement learning to ensure that the student policy can continue to learn in a POMDP, thereby enhancing the model's upper bound. Our experimental results demonstrate that our two-stage training framework achieves higher training efficiency and stability in simulated environments, while also exhibiting better robustness and generalization capabilities in real-world applications.

View blog

#computer-science #robotics

Resources

105

LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures

11 Mar 2025

The Hong Kong University of Science and Technology (Guangzhou)Beijing Innovation Center of Humanoid Robotics Co. Ltd.

LiPS integrates detailed multi-rigid-body dynamics of humanoid robot parallel ankle mechanisms directly into large-scale GPU-accelerated reinforcement learning simulations. This approach enables policies to learn parallel-aware control strategies, significantly reducing the sim-to-real gap and allowing direct, robust deployment on physical humanoid robots such as the Tien Kung.

View blog

#computer-science #robotics

Resources

144

Trinity: A Modular Humanoid Robot AI System

11 Mar 2025

The Hong Kong University of Science and Technology (Guangzhou)Beijing Innovation Center of Humanoid Robotics Co. Ltd.

In recent years, research on humanoid robots has garnered increasing attention. With breakthroughs in various types of artificial intelligence algorithms, embodied intelligence, exemplified by humanoid robots, has been highly anticipated. The advancements in reinforcement learning (RL) algorithms have significantly improved the motion control and generalization capabilities of humanoid robots. Simultaneously, the groundbreaking progress in large language models (LLM) and visual language models (VLM) has brought more possibilities and imagination to humanoid robots. LLM enables humanoid robots to understand complex tasks from language instructions and perform long-term task planning, while VLM greatly enhances the robots' understanding and interaction with their environment. This paper introduces \textcolor{magenta}{Trinity}, a novel AI system for humanoid robots that integrates RL, LLM, and VLM. By combining these technologies, Trinity enables efficient control of humanoid robots in complex environments. This innovative approach not only enhances the capabilities but also opens new avenues for future research and applications of humanoid robotics.

View blog

#computer-science #robotics

Resources

162

Spiking Neural Network as Adaptive Event Stream Slicer

24 Mar 2025

Northeastern University The Hong Kong University of Science and Technology (Guangzhou)

Event-based cameras are attracting significant interest as they provide rich edge information, high dynamic range, and high temporal resolution. Many state-of-the-art event-based algorithms rely on splitting the events into fixed groups, resulting in the omission of crucial temporal information, particularly when dealing with diverse motion scenarios (\eg, high/low speed).In this work, we propose SpikeSlicer, a novel-designed plug-and-play event processing method capable of splitting events stream adaptively.SpikeSlicer utilizes a low-energy spiking neural network (SNN) to trigger event slicing. To guide the SNN to fire spikes at optimal time steps, we propose the Spiking Position-aware Loss (SPA-Loss) to modulate the neuron's state. Additionally, we develop a Feedback-Update training strategy that refines the slicing decisions using feedback from the downstream artificial neural network (ANN). Extensive experiments demonstrate that our method yields significant performance improvements in event-based object tracking and recognition. Notably, SpikeSlicer provides a brand-new SNN-ANN cooperation paradigm, where the SNN acts as an efficient, low-energy data processor to assist the ANN in improving downstream performance, injecting new perspectives and potential avenues of exploration. Our code is available at this https URL

View blog

#computer-science #computer-vision-and-pattern-recognition #neural-and-evolutionary-computing

Resources

Event Masked Autoencoder: Point-wise Action Recognition with Event-Based Cameras

02 Jan 2025

Hong Kong University of Science and Technology (Guangzhou)Beijing Innovation Center of Humanoid Robotics Co. Ltd.

Dynamic vision sensors (DVS) are bio-inspired devices that capture visual information in the form of asynchronous events, which encode changes in pixel intensity with high temporal resolution and low latency. These events provide rich motion cues that can be exploited for various computer vision tasks, such as action recognition. However, most existing DVS-based action recognition methods lose temporal information during data transformation or suffer from noise and outliers caused by sensor imperfections or environmental factors. To address these challenges, we propose a novel framework that preserves and exploits the spatiotemporal structure of event data for action recognition. Our framework consists of two main components: 1) a point-wise event masked autoencoder (MAE) that learns a compact and discriminative representation of event patches by reconstructing them from masked raw event camera points data; 2) an improved event points patch generation algorithm that leverages an event data inlier model and point-wise data augmentation techniques to enhance the quality and diversity of event points patches. To the best of our knowledge, our approach introduces the pre-train method into event camera raw points data for the first time, and we propose a novel event points patch embedding to utilize transformer-based models on event cameras.

View blog

#computer-science #computer-vision-security #computer-vision-and-pattern-recognition

Resources

Multi-Floor Zero-Shot Object Navigation Policy

17 Sep 2024

The Hong Kong University of Science and Technology (Guangzhou)Beijing Innovation Center of Humanoid Robotics Co. Ltd.

Object navigation in multi-floor environments presents a formidable challenge in robotics, requiring sophisticated spatial reasoning and adaptive exploration strategies. Traditional approaches have primarily focused on single-floor scenarios, overlooking the complexities introduced by multi-floor structures. To address these challenges, we first propose a Multi-floor Navigation Policy (MFNP) and implement it in Zero-Shot object navigation tasks. Our framework comprises three key components: (i) Multi-floor Navigation Policy, which enables an agent to explore across multiple floors; (ii) Multi-modal Large Language Models (MLLMs) for reasoning in the navigation process; and (iii) Inter-Floor Navigation, ensuring efficient floor transitions. We evaluate MFNP on the Habitat-Matterport 3D (HM3D) and Matterport 3D (MP3D) datasets, both include multi-floor scenes. Our experiment results demonstrate that MFNP significantly outperforms all the existing methods in Zero-Shot object navigation, achieving higher success rates and improved exploration efficiency. Ablation studies further highlight the effectiveness of each component in addressing the unique challenges of multi-floor navigation. Meanwhile, we conducted real-world experiments to evaluate the feasibility of our policy. Upon deployment of MFNP, the Unitree quadruped robot demonstrated successful multi-floor navigation and found the target object in a completely unseen environment. By introducing MFNP, we offer a new paradigm for tackling complex, multi-floor environments in object navigation tasks, opening avenues for future research in visual-based navigation in realistic, multi-floor settings.

View blog

#computer-science #robotics

Resources

ES-Parkour: Advanced Robot Parkour with Bio-inspired Event Camera and Spiking Neural Network

19 Mar 2025

The Hong Kong University of Science and Technology (Guangzhou)Beijing Innovation Center of Humanoid Robotics Co. Ltd.

In recent years, quadruped robotics has advanced significantly, particularly in perception and motion control via reinforcement learning, enabling complex motions in challenging environments. Visual sensors like depth cameras enhance stability and robustness but face limitations, such as low operating frequencies relative to joint control and sensitivity to lighting, which hinder outdoor deployment. Additionally, deep neural networks in sensor and control systems increase computational demands. To address these issues, we introduce spiking neural networks (SNNs) and event cameras to perform a challenging quadruped parkour task. Event cameras capture dynamic visual data, while SNNs efficiently process spike sequences, mimicking biological perception. Experimental results demonstrate that this approach significantly outperforms traditional models, achieving excellent parkour performance with just 11.7% of the energy consumption of an artificial neural network (ANN)-based model, yielding an 88.3% energy reduction. By integrating event cameras with SNNs, our work advances robotic reinforcement learning and opens new possibilities for applications in demanding environments.

View blog

#computer-science #computer-vision-and-pattern-recognition #machine-learning

Resources

Omni-Perception: Omnidirectional Collision Avoidance for Legged Locomotion in Dynamic Environments

28 Aug 2025

Tsinghua University The Hong Kong University of Science and Technology (Guangzhou)

Agile locomotion in complex 3D environments requires robust spatial awareness to safely avoid diverse obstacles such as aerial clutter, uneven terrain, and dynamic agents. Depth-based perception approaches often struggle with sensor noise, lighting variability, computational overhead from intermediate representations (e.g., elevation maps), and difficulties with non-planar obstacles, limiting performance in unstructured environments. In contrast, direct integration of LiDAR sensing into end-to-end learning for legged locomotion remains underexplored. We propose Omni-Perception, an end-to-end locomotion policy that achieves 3D spatial awareness and omnidirectional collision avoidance by directly processing raw LiDAR point clouds. At its core is PD-RiskNet (Proximal-Distal Risk-Aware Hierarchical Network), a novel perception module that interprets spatio-temporal LiDAR data for environmental risk assessment. To facilitate efficient policy learning, we develop a high-fidelity LiDAR simulation toolkit with realistic noise modeling and fast raycasting, compatible with platforms such as Isaac Gym, Genesis, and MuJoCo, enabling scalable training and effective sim-to-real transfer. Learning reactive control policies directly from raw LiDAR data enables the robot to navigate complex environments with static and dynamic obstacles more robustly than approaches relying on intermediate maps or limited sensing. We validate Omni-Perception through real-world experiments and extensive simulation, demonstrating strong omnidirectional avoidance capabilities and superior locomotion performance in highly dynamic environments.

View blog

#computer-science #robotics

Resources

DEL: Discrete Element Learner for Learning 3D Particle Dynamics with Neural Rendering

11 Oct 2024

Northeastern University Hong Kong University of Science and Technology (Guangzhou)

Learning-based simulators show great potential for simulating particle dynamics when 3D groundtruth is available, but per-particle correspondences are not always accessible. The development of neural rendering presents a new solution to this field to learn 3D dynamics from 2D images by inverse rendering. However, existing approaches still suffer from ill-posed natures resulting from the 2D to 3D uncertainty, for example, specific 2D images can correspond with various 3D particle distributions. To mitigate such uncertainty, we consider a conventional, mechanically interpretable framework as the physical priors and extend it to a learning-based version. In brief, we incorporate the learnable graph kernels into the classic Discrete Element Analysis (DEA) framework to implement a novel mechanics-integrated learning system. In this case, the graph network kernels are only used for approximating some specific mechanical operators in the DEA framework rather than the whole dynamics mapping. By integrating the strong physics priors, our methods can effectively learn the dynamics of various materials from the partial 2D observations in a unified manner. Experiments show that our approach outperforms other learned simulators by a large margin in this context and is robust to different renderers, fewer training samples, and fewer camera views.

View blog

#computer-science #computer-vision-and-pattern-recognition #graphics

Resources

There are no more papers matching your filters at the moment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

Ask or search anything...

Events