R1-Searcher, from Renmin University of China, introduces a two-stage outcome-based reinforcement learning framework that enables Large Language Models to autonomously invoke and leverage external search systems. This approach significantly outperforms strong RAG baselines on multi-hop question answering benchmarks and demonstrates robust generalization to out-of-domain and online search scenarios.
View blogR1-Searcher++ presents a framework that enables large language models to dynamically choose between using their internal knowledge and performing external searches, while also allowing them to internalize retrieved information. This approach improves performance on multi-hop question answering tasks and significantly reduces the number of external retrieval calls compared to prior methods.
View blogResearchers from Renmin University and BAAI present a comprehensive empirical study of reinforcement learning techniques for enhancing LLM reasoning capabilities, demonstrating dramatic improvements through novel reward engineering and tool manipulation while achieving 86.67% accuracy on AIME 2024 mathematics problems through an innovative combination of RL training and external computation tools.
View blogThis paper from Renmin University, DataCanvas, and BAAI presents a systematic approach to improve the training effectiveness and stability of tool-augmented reinforcement learning for code-integrated reasoning in large language models. The method achieves state-of-the-art performance on mathematical reasoning benchmarks and provides mechanistic insights into how code integration extends model capabilities, offering efficiency over traditional reasoning methods.
View blog