alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

We're hiring
PaperBlogResources

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

BibTex
Copy
@misc{alahi2024gemgeneralizableegovision,
      title={GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control}, 
      author={Alexandre Alahi and Davide Scaramuzza and Marc Pollefeys and Mathieu Salzmann and Marco Cannici and Suman Saha and Lin Zhang and Xi Wang and Paolo Favaro and Elie Aljalbout and Botao Ye and Aram Davtyan and Ahmad Rahimi and Yasaman Haghighi and Xiaoran Chen and Mariam Hassan and Isinsu Katircioglu and Sebastian Stapf and Pedro M B Rezende and David Brüggemann},
      year={2024},
      eprint={2412.11198},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.11198}, 
}
GitHub
GEM
88
HTTPS
https://github.com/vita-epfl/GEM
SSH
git@github.com:vita-epfl/GEM.git
CLI
gh repo clone vita-epfl/GEM
Transform this paper into an audio lecture
Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.
Audio lecture
Q&A format