On Designing Effective RL Reward at Training Time for LLM Reasoning
BibTex
Copy
@misc{gao2024designingeffectiverl,
title={On Designing Effective RL Reward at Training Time for LLM Reasoning},
author={Jiaxuan Gao and Yi Wu and Weilin Liu and Shusheng Xu and Wei Fu and Wenjie Ye and Zhiyu Mei and Guangju Wang and Chuyi He},
year={2024},
eprint={2410.15115},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.15115},
}
Transform this paper into an audio lecture
Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.