Researchers successfully adapted Reinforcement Learning with Verifiable Rewards (RLVR) to predict real-world future events, training a 14-billion-parameter model to achieve superior probabilistic calibration and predictive accuracy compared to a frontier LLM (OpenAI o1). The fine-tuned models demonstrated a 10% hypothetical return on investment in simulated prediction market trading, particularly excelling where market consensus was uncertain.
View blog