“Model ensembles are a pretty much-guaranteed way to gain 2% of accuracy on anything.” - Andrej Karpathy.
I absolutely agree! However, deploying an ensemble of heavyweight models may not always be feasible in many cases. Sometimes, your single model could be so large (GPT-3, for example) that deploying it in resource-constrained environments is often not possible. This is why we have been going over some of model optimization recipes - Quantization and Pruning. This report is the last one in this series. In this report, we will discuss a compelling model optimization technique - knowledge distillation. I have structured the accompanying article into the following sections -
Don't forget to tag @sayakpaul in your comment, otherwise they may not be notified.