State Reconstruction for Diffusion Policies (SRDP) improves out-of-distribution generalization in offline reinforcement learning by integrating an auxiliary state reconstruction objective into diffusion policies. This method demonstrates a 167% performance improvement over Diffusion-QL in maze navigation with missing data and achieves a Chamfer distance of 0.071 0.02 against ground truth in real-world UR10 robot experiments.
2
As humans learn new skills and apply their existing knowledge while maintaining previously learned information, "continual learning" in machine learning aims to incorporate new data while retaining and utilizing past knowledge. However, existing machine learning methods often does not mimic human learning where tasks are intermixed due to individual preferences and environmental conditions. Humans typically switch between tasks instead of completely mastering one task before proceeding to the next. To explore how human-like task switching can enhance learning efficiency, we propose a multi task learning architecture that alternates tasks based on task-agnostic measures such as "learning progress" and "neural computational energy expenditure". To evaluate the efficacy of our method, we run several systematic experiments by using a set of effect-prediction tasks executed by a simulated manipulator robot. The experiments show that our approach surpasses random interleaved and sequential task learning in terms of average learning accuracy. Moreover, by including energy expenditure in the task switching logic, our approach can still perform favorably while reducing neural energy expenditure.
There are no more papers matching your filters at the moment.