LightVLA introduces a differentiable token pruning framework that simultaneously boosts task success rates and reduces computational overhead in Vision-Language-Action (VLA) models, making them more efficient for deployment on resource-constrained platforms. The framework achieved a 2.6% improvement in task success rate and a 59.1% reduction in total FLOPs on the LIBERO benchmark, relative to its foundation model, OpenVLA-OFT.
View blogResearchers from LiAuto Inc. developed the AVA-VLA framework, reformulating Vision-Language-Action models from a Partially Observable Markov Decision Process perspective, which allows for dynamic visual attention based on historical context. The system achieves state-of-the-art success rates on robot manipulation benchmarks and demonstrates robust real-world performance on a dual-arm robot.
View blogThe COPO framework enhances Large Language Models' reasoning capabilities by resolving vanishing gradients in Group-Relative Policy Optimization (GRPO). It integrates local and global optimization strategies, ensuring all training samples contribute meaningful learning signals, leading to superior performance on mathematical reasoning tasks.
View blog