GrizzlyTech
MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

MACKO introduces a novel sparse matrix-vector multiplication (SpMV) method and storage format tailored for low and unstructured sparsity in Large Language Models (LLMs) on GPUs. This approach enables practical deployment of pruned LLMs, achieving up to 1.5x speedup and 1.5x memory reduction for Llama2-7B at 50% sparsity compared to dense computation.

View blog
Resources
There are no more papers matching your filters at the moment.