SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

BibTex

Copy

@misc{zhang2025srpoenhancingmultimodal,
      title={SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware  Reinforcement Learning},
      author={Yu Zhang and Yangfan He and Yifan Jiang and Che Liu and Jing Xiong and Mi Zhang and Zhongwei Wan and Shen Yan and Yi Xin and Hui Shen and Zhihao Dou and Dongfei Cui and Qinjian Zhao},
      year={2025},
      eprint={2506.01713},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.01713},
}

Transform this paper into an audio lecture

Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.

Audio lecture

Q&A format

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning