Qiyuan Tech and Renmin University researchers developed Light-R1, an open-source suite for training long Chain-of-Thought reasoning models using public data, achieving state-of-the-art mathematical performance. Their Light-R1-32B model, trained for approximately $1000, surpassed DeepSeek-R1-Distill-Qwen-32B on AIME24 (76.6 vs. 72.6) and AIME25 (64.6 vs. 54.9), while the 14B variant demonstrated a ~2% absolute improvement on AIME24 via Reinforcement Learning without typical response length reduction.
View blogThe DCache framework accelerates diffusion-based Large Language Models (dLLMs) by introducing a training-free approximate Key-Value (KV) cache. It achieves an average 3.2x to 4.0x speedup in inference throughput over vanilla dLLM inference while maintaining or improving generation quality across various benchmarks.
View blogResearchers from Peking University and Qiyuan Tech developed LongRePS, a process-supervised framework that trains language models to generate high-quality reasoning paths for improved long-context performance. The framework significantly enhances reasoning capabilities, achieving gains of up to 13.6 points on specific datasets and enabling smaller open-source models to perform comparably to larger proprietary models on long-context reasoning tasks.
View blogLongAttn introduces a framework that selects high-quality long-context training data for language models by analyzing token-level attention mechanisms. The method, developed by researchers from Peking University and Qiyuan Tech, consistently improves performance on long-context tasks like Needle In A Haystack and RULER while reducing the required training data volume compared to existing methods.
View blogRouter Upcycling introduces a method that leverages attention modules from pre-trained dense models to initialize a mixture-of-routers for Mixture-of-Experts (MoE) upcycling. This approach achieves state-of-the-art performance, outperforming vanilla MoE upcycling by 2.05 points on average across ten benchmarks, while adding negligible computational overhead.
View blog