Publications
You can also find my articles on my Google Scholar profile.
2023
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Shu Zhao, Peng Zhang, Jie Tang code, PDF
![main pic](../images/glmd.jpg)
A general language model distillation (GLMD) method that performs two-stage word prediction distillation and vocabulary compression, which is simple and surprisingly shows extremely strong performance.