LiAuto Inc.
The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning

LightVLA introduces a differentiable token pruning framework that simultaneously boosts task success rates and reduces computational overhead in Vision-Language-Action (VLA) models, making them more efficient for deployment on resource-constrained platforms. The framework achieved a 2.6% improvement in task success rate and a 59.1% reduction in total FLOPs on the LIBERO benchmark, relative to its foundation model, OpenVLA-OFT.

View blog
Resources15
AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

Researchers from LiAuto Inc. developed the AVA-VLA framework, reformulating Vision-Language-Action models from a Partially Observable Markov Decision Process perspective, which allows for dynamic visual attention based on historical context. The system achieves state-of-the-art success rates on robot manipulation benchmarks and demonstrates robust real-world performance on a dual-arm robot.

View blog
Resources
COPO: Consistency-Aware Policy Optimization

The COPO framework enhances Large Language Models' reasoning capabilities by resolving vanishing gradients in Group-Relative Policy Optimization (GRPO). It integrates local and global optimization strategies, ensuring all training samples contribute meaningful learning signals, leading to superior performance on mathematical reasoning tasks.

View blog
Resources
There are no more papers matching your filters at the moment.