YanTron Technology Co. Ltd
Researchers introduced EASY-EP, a domain-specific pruning method for large Mixture-of-Experts (MoE) models, leveraging few-shot demonstrations to identify and retain critical experts. The approach yielded up to a 4.33x increase in inference throughput and substantial memory reduction on models like DeepSeek-R1, while preserving over 90% of the original model's performance across diverse benchmarks.
There are no more papers matching your filters at the moment.