alphaXiv

History

Papers Benchmarks

Simon Fraser University

885

23 Sep 2025

computer-science computer-vision-and-pattern-recognition graphics

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

University of Toronto

NVIDIA Vector Institute Simon Fraser University

LYRA generates high-quality, geometrically consistent 3D and 4D scenes by distilling implicit 3D knowledge from a video diffusion model into explicit 3D Gaussian Splatting representations, eliminating the reliance on real-world multi-view training data. The framework achieves state-of-the-art results on several 3D reconstruction benchmarks and offers real-time rendering capabilities.

447

1,708

05 May 2025

computer-science computer-vision-and-pattern-recognition machine-learning

TWIST: Teleoperated Whole-Body Imitation System

Stanford University Simon Fraser University

A teleoperation system enables real-time whole-body control of humanoid robots through human motion imitation, combining reinforcement learning with behavior cloning to achieve coordinated movements across diverse tasks while maintaining a 0.9-second latency on the Unitree G1 platform.

408

510

04 Oct 2025

adversarial-attacks computer-science artificial-intelligence

Physics-Based Motion Imitation with Adversarial Differential Discriminators

NVIDIA Simon Fraser University Sony PlayStation

Physics-Based Motion Imitation with Adversarial Differential Discriminators introduces an Adversarial Differential Discriminator (ADD), an adversarial multi-objective optimization technique for physics-based character animation and multi-objective reinforcement learning. This method enables simulated characters to precisely replicate agile motions comparable to state-of-the-art tracking methods, but without requiring manual reward engineering, and also demonstrates broader applicability in robotics control.

796

25 Sep 2024

computer-science computer-vision-security artificial-intelligence

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Rawal Khirodkar

Zhengyi Luo

Ego-Exo4D introduces the largest public dataset of time-synchronized, multimodal, multiview ego-exocentric video, capturing 740 participants performing skilled activities across 8 diverse domains in 123 natural environments. The dataset, a collaboration of 15 institutions, includes Project Aria data and extensive language annotations, supporting four benchmark families for understanding human skill.

2,382

19 Apr 2025

computer-science robotics

Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning

Google DeepMind

Carnegie Mellon University

Georgia Institute of Technology

Stanford University Harbin Institute of Technology

The University of Texas at Austin

NVIDIA

Duke University Simon Fraser University Technische Universität München CNRS-AIST Joint Robotics Laboratory The Institute for Human and Machine Cognition CNRS-University of Montpellier LIRMM The University of Southern California The AI Institute

Researchers from Georgia Institute of Technology, Harbin Institute of Technology, Google DeepMind, and others compiled a comprehensive survey of humanoid locomotion and manipulation. It integrates traditional model-based methods with learning-based techniques and explores the emerging role of foundation models, highlighting the critical importance of whole-body tactile feedback.

586

04 Sep 2025

computer-science robotics

GMT: General Motion Tracking for Humanoid Whole-Body Control

University of California, San Diego Simon Fraser University

The GMT framework introduces a method for training a single, unified policy that enables humanoid robots to track a wide range of human motions with high fidelity in the real world. This approach, developed by researchers from UC San Diego and Simon Fraser University, achieves state-of-the-art tracking performance in simulation and demonstrates robust reproduction of diverse skills on a Unitree G1 robot.

303

196

29 Sep 2025

computer-science computer-vision-and-pattern-recognition geometric-deep-learning

Triangle Splatting+: Differentiable Rendering with Opaque Triangles

University of Toronto

University of British Columbia

University of Maryland Simon Fraser University

Adobe University of Li`ege

Triangle Splatting+, from a collaboration including University of Liège, Simon Fraser University, University of Maryland, and Adobe Research, optimizes explicit, opaque, and semi-connected triangle meshes for novel view synthesis. This method achieves state-of-the-art visual quality among mesh-based techniques, yielding outputs directly compatible with traditional game engines without post-processing for mesh extraction or coloring, and trains in approximately 39 minutes for Mip-NeRF360 scenes.

616

25 Sep 2025

agents computer-science artificial-intelligence

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

Carnegie Mellon University Simon Fraser University University of California Berkeley Norwegian University of Science and Technology

LeVERB is a framework designed for humanoid robots to execute agile whole-body actions through latent vision-language instructions, enabling zero-shot sim-to-real transfer. The framework achieved an average 58.5% success rate across diverse tasks in simulation, which is a 7.8 times improvement over a naive hierarchical VLA, and successfully generalized to unseen commands on a Unitree G1 robot.

177

16 Oct 2025

agents computer-science graphics

MimicKit: A Reinforcement Learning Framework for Motion Imitation and Control

NVIDIA Simon Fraser University

MimicKit is an open-source framework for training motion controllers using motion imitation and reinforcement learning. The codebase provides implementations of commonly-used motion-imitation techniques and RL algorithms. This framework is intended to support research and applications in computer graphics and robotics by providing a unified training framework, along with standardized environment, agent, and data structures. The codebase is designed to be modular and easily configurable, enabling convenient modification and extension to new characters and tasks. The open-source codebase is available at: this https URL.

617

769

16 Jun 2025

agents chain-of-thought computer-science

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

Shanghai AI Lab

Nanyang Technological University Simon Fraser University A*STAR Singapore

Ego-R1 introduces a Chain-of-Tool-Thought framework that dynamically orchestrates specialized perception tools for reasoning over ultra-long egocentric videos spanning days and weeks. The approach achieves 46.0% accuracy on a new week-long benchmark, surpassing state-of-the-art models like Gemini-1.5-Pro.

895

12 Feb 2025

computer-science computer-vision-and-pattern-recognition generative-models

3D Gaussian Splatting as Markov Chain Monte Carlo

University of Toronto

Google DeepMind

Google Research

University of British Columbia Simon Fraser University

Researchers from UBC, Google Research, and Google DeepMind introduce an approach that reinterprets 3D Gaussian Splatting (3DGS) optimization as a Markov Chain Monte Carlo (MCMC) process, leveraging Stochastic Gradient Langevin Dynamics (SGLD) for principled exploration. This method achieves higher rendering quality and robustness to initialization, consistently outperforming conventional 3DGS across datasets like NeRF Synthetic and MipNeRF 360, while maintaining real-time inference speeds.

226

15 Sep 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data

National Research Council Canada Simon Fraser University Electronic Arts

YI SHI

Yuxuan Mu

Researchers developed StableMotion, a framework for training motion cleanup models directly on unpaired, corrupted motion data using a diffusion-based approach with quality indicator variables. This method achieved a 68% reduction in motion pops and an 81% reduction in frozen frames on a proprietary soccer mocap dataset, while also outperforming state-of-the-art cleanup models on controlled benchmarks.

144

1,816

04 Oct 2024

computer-science computer-vision-and-pattern-recognition generative-models

CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control

Carnegie Mellon University

Tel Aviv University

University of British Columbia

NVIDIA Simon Fraser University

Zhengyi Luo

The CLOSD system introduces a method for controlling virtual characters by closing the loop between motion planning and physical execution. It combines an auto-regressive diffusion model for real-time motion planning with a physics-based reinforcement learning controller, enabling characters to perform diverse, physically plausible actions in response to text commands and interact realistically with environments.

218

1,070

03 Nov 2024

computer-science computer-vision-and-pattern-recognition machine-learning

BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry

Simon Fraser University Autodesk Research

BrepGen introduces a generative diffusion model capable of directly synthesizing industrially-standard Boundary representation (B-rep) 3D models. It achieves this by employing a novel structured latent geometry representation that implicitly encodes topology, enabling the generation of complex, watertight models including free-form surfaces, and demonstrates capabilities like design autocompletion and interpolation.

260

376

06 May 2025

computer-science artificial-intelligence graphics

PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers

NVIDIA Simon Fraser University

YI SHI

A framework called PARC enables training of agile terrain traversal controllers from small motion datasets through iterative co-training between a diffusion-based motion generator and physics-based tracking controller, progressively expanding the motion repertoire while maintaining physical plausibility through simulation-based correction.

806

04 Apr 2024

computer-science computer-vision-and-pattern-recognition machine-learning

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

University of Toronto Simon Fraser University

pixelSplat, developed by researchers at MIT and Simon Fraser University, reconstructs 3D Gaussian Splatting representations from just two input images in a single feed-forward pass. It achieves real-time novel view synthesis with explicit 3D scene geometry, rendering images approximately 650 times faster than prior state-of-the-art generalizable methods while improving perceptual quality.

993

1,193

09 Sep 2025

computer-science computer-vision-and-pattern-recognition machine-learning

Generalizable Humanoid Manipulation with 3D Diffusion Policies

University of Illinois at Urbana-Champaign

Stanford University CMU Simon Fraser University UPenn

Humanoid robots capable of autonomous operation in diverse environments have long been a goal for roboticists. However, autonomous manipulation by humanoid robots has largely been restricted to one specific scene, primarily due to the difficulty of acquiring generalizable skills and the expensiveness of in-the-wild humanoid robot data. In this work, we build a real-world robotic system to address this challenging problem. Our system is mainly an integration of 1) a whole-upper-body robotic teleoperation system to acquire human-like robot data, 2) a 25-DoF humanoid robot platform with a height-adjustable cart and a 3D LiDAR sensor, and 3) an improved 3D Diffusion Policy learning algorithm for humanoid robots to learn from noisy human data. We run more than 2000 episodes of policy rollouts on the real robot for rigorous policy evaluation. Empowered by this system, we show that using only data collected in one single scene and with only onboard computing, a full-sized humanoid robot can autonomously perform skills in diverse real-world scenarios. Videos are available at this https URL .

213

197

02 Sep 2025

computer-science artificial-intelligence databases

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

Shanghai Jiao Tong University

Tsinghua University

Renmin University of China Simon Fraser University

ST-Raptor introduces a Hierarchical Orthogonal Tree (HO-Tree) model and a pipeline-based question answering framework to process semi-structured tables. It achieved 72.39% accuracy on the new SSTQA benchmark, outperforming existing baselines by over 10.23%.

269

136

22 Oct 2025

computer-science computer-vision-and-pattern-recognition generative-models

Advances in 4D Representation: Geometry, Motion, and Interaction

University of Alberta Simon Fraser University

A survey provides a representation-centric framework for understanding recent advancements in 4D generation and reconstruction, critically analyzing various representations for geometry, motion, and interaction. It offers a detailed comparison of their properties, challenges, and trade-offs across dimensions such as visual fidelity, scalability, and temporal consistency to guide researchers in selecting and customizing appropriate 4D representations.

805

25 May 2025

computer-science computer-vision-and-pattern-recognition neural-rendering

Triangle Splatting for Real-Time Radiance Field Rendering

University of Toronto

Google DeepMind

University of Oxford

KAUST Simon Fraser University University of Liège

Silvio Giancola

This work introduces Triangle Splatting, a differentiable rendering approach that optimizes unstructured 3D triangles to reconstruct photorealistic scenes from images. The method achieves state-of-the-art visual fidelity, notably improving perceptual quality over prior splatting techniques, and renders at thousands of frames per second, outperforming implicit methods by orders of magnitude.

772

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

TWIST: Teleoperated Whole-Body Imitation System

Physics-Based Motion Imitation with Adversarial Differential Discriminators

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning

GMT: General Motion Tracking for Humanoid Whole-Body Control

Triangle Splatting+: Differentiable Rendering with Opaque Triangles

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

MimicKit: A Reinforcement Learning Framework for Motion Imitation and Control

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

3D Gaussian Splatting as Markov Chain Monte Carlo

StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data

CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control

BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry

PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

Generalizable Humanoid Manipulation with 3D Diffusion Policies

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

Advances in 4D Representation: Geometry, Motion, and Interaction

Triangle Splatting for Real-Time Radiance Field Rendering

Events

AI for Law

Personalize Your Feed