Distilling Knowledge in Neural Networks
This project demonstrates the compelling model optimization technique - knowledge distillation with code walkthroughs in TensorFlow.
model-optimization tensorflow computer-vision wandb article code tutorial

“Model ensembles are a pretty much-guaranteed way to gain 2% of accuracy on anything.” - Andrej Karpathy.

I absolutely agree! However, deploying an ensemble of heavyweight models may not always be feasible in many cases. Sometimes, your single model could be so large (GPT-3, for example) that deploying it in resource-constrained environments is often not possible. This is why we have been going over some of model optimization recipes - Quantization and Pruning. This report is the last one in this series. In this report, we will discuss a compelling model optimization technique - knowledge distillation. I have structured the accompanying article into the following sections -

  • What is softmax telling us?
  • Using the softmax information for teaching - Knowledge distillation
  • Loss functions in knowledge distillation
  • A few training recipes
  • Experimental results
  • Conclusion

Don't forget to tag @sayakpaul in your comment, otherwise they may not be notified.

Authors original post
Calling `model.fit()` @ https://pyimagesearch.com | Netflix Nerd
Share this project
Top collections