GenPO introduces a framework that integrates generative diffusion policies into on-policy reinforcement learning, enabling tractable likelihood computations. The method consistently achieves higher returns and enhanced stability across diverse IsaacLab robotic control tasks compared to existing model-free and off-policy diffusion RL baselines.
View blogResearchers at ShanghaiTech University developed DreamPolicy, a unified framework for versatile humanoid locomotion that enables a single policy to generalize zero-shot across diverse and unseen terrains. It achieves this by integrating offline data, a diffusion-based trajectory planner that generates 'Humanoid Motion Imagery', and a physics-constrained reinforcement learning policy, outperforming existing methods by an average of 20% in unseen environments.
View blog