Knowledge distillation is a technique for training a neural network by distilling the knowledge of a larger, more accurate network into a smaller, less accurate network. This can be done by training the smaller network to mimic the predictions of the larger network, or by training the smaller network to predict the same soft labels as the larger network.
- MIT HAN Lab Lecture on Knowledge Distillation (I)
- Awesome Knowledge Distillation
- knowledge-distillation-pytorch