Lightning Rod Labs
Outcome-based Reinforcement Learning to Predict the Future

Researchers successfully adapted Reinforcement Learning with Verifiable Rewards (RLVR) to predict real-world future events, training a 14-billion-parameter model to achieve superior probabilistic calibration and predictive accuracy compared to a frontier LLM (OpenAI o1). The fine-tuned models demonstrated a 10% hypothetical return on investment in simulated prediction market trading, particularly excelling where market consensus was uncertain.

View blog
Resources
LLMs Can Teach Themselves to Better Predict the Future
We present an outcome-driven fine-tuning framework that enhances the forecasting capabilities of large language models (LLMs) without relying on human-curated reasoning samples. Our method leverages model self-play to generate pairs of diverse reasoning trajectories and probabilistic forecasts for a set of diverse questions that resolve after the models' knowledge cutoff date. We then rank pairs of these reasoning traces by their distance to the actual outcomes before fine-tuning the model via Direct Preference Optimization (DPO). On a separate test set, our approach increases prediction accuracy of Phi-4 14B and DeepSeek-R1 14B by between 7--10\% over a base model and a DPO fine-tuned control model with randomized labels, bringing them on par with forecasting capabilities of much larger frontier models like GPT-4o.
View blog
Resources
There are no more papers matching your filters at the moment.