Model Compression

Model compression is a technique that shrinks trained neural networks. Compressed models often perform similarly to the original while using a fraction of the computational resources.


A Survey of Methods for Model Compression in NLP
A look at model compression techniques applied on base model pre-training to reduce the computational cost of prediction.
model-compression pruning knowledge-distillation precision-reduction
Compression of Deep Learning Models for Text: A Survey
In this survey, we discuss six different types of methods for compression of such models to enable their deployment in real industry NLP projects.
pruning quantization knowledge-distillation parameter-sharing


Do We Really Need Model Compression?
In this blog post, we’ll explore the obstacles involved in training small models from scratch.
model-compression tutorial article
NLP for Developers: Shrinking Transformers | Rasa
In this video, Rasa Senior Developer Advocate Rachael will talk about different approaches to make transformer models smaller.
model-compression distillation pruning transformers
PyTorch Pruning | How it's Made by Michela Paganini
In this talk, you will learn about pruning, why it's important and how to get started using PyTorch's Pruning (torch.nn.utils.prune).
pruning pytorch model-compression video


KD Lib
A PyTorch library to easily facilitate knowledge distillation for custom deep learning models.
knowledge-distillation model-compression pytorch code
AIMET - Model Efficiency Toolkit
AIMET is a library that provides advanced model quantization and compression techniques for trained neural network models.
quantization model-compression aimet qualcomm
AquVitae: The Easiest Knowledge Distillation Library
AquVitae is a Python library that is the easiest to perform Knowledge Distillation through a very simple API. This library supports TensorFlow and PyTorch. ...
tensorflow pytorch light-weight deep-learning
A PyTorch-based model distillation toolkit for natural language processing.
model-distillation natural-language-processing model-compression distillation
Movement Pruning: Adaptive Sparsity by Fine-Tuning
We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning.
pruning movement-pruning sparsity adaptive-sparsity
Table of Contents
Share a project
Share something you or the community has made with ML.
Topic experts