alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

We're hiring
PaperBlogResources

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

BibTex
Copy
@misc{li2025jarvisvlaposttraininglargescale,
      title={JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play  Visual Games with Keyboards and Mouse},
      author={Muyao Li and Zihao Wang and Kaichen He and Xiaojian Ma and Yitao Liang},
      year={2025},
      eprint={2503.16365},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.16365},
}
GitHub
JarvisVLA
48
HTTPS
https://github.com/CraftJarvis/JarvisVLA
SSH
git@github.com:CraftJarvis/JarvisVLA.git
CLI
gh repo clone CraftJarvis/JarvisVLA
Transform this paper into an audio lecture
Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.
Audio lecture
Q&A format