Fast Block Sparse Matrices for Pytorch
Enables networks which are both smaller and faster to let anybody use neural networks in production at low cost, and to improve the experience for the end ...
sparsity pytorch gpu efficiency article code library

This PyTorch extension provides a drop-in replacement for torch.nn.Linear using block sparse matrices instead of dense ones. It enables very easy experimentation with sparse matrices since you can directly replace Linear layers in your model with sparse ones.

Blog posts:

  1. Block Sparse Matrices for Smaller and Faster Language Models
  2. Is the future of Neural Networks Sparse? An Introduction (1/N)
  3. Sparse Neural Networks (2/N): Understanding GPU Performance

Don't forget to tag @huggingface , @madlag in your comment, otherwise they may not be notified.

Authors community post
Solving NLP, one commit at a time!
Ex-CTO of Stupeflix, acquired by GoPro in 2016. Now full-time on AI / Deep Learning research.
Share this project
Similar projects
Movement Pruning: Adaptive Sparsity by Fine-Tuning
We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning.
Sparse Neural Networks (2/N): Understanding GPU Performance
NVIDIA Ampere A100 introduces fine-grained structured sparsity.