JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse
BibTex
Copy
@misc{li2025jarvisvlaposttraininglargescale,
title={JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse},
author={Muyao Li and Zihao Wang and Kaichen He and Xiaojian Ma and Yitao Liang},
year={2025},
eprint={2503.16365},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.16365},
}