Publications

You can also find my articles on my Google Scholar profile.

2023


Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Shu Zhao, Peng Zhang, Jie Tang code, PDF

![main pic](../images/glmd.jpg)
A general language model distillation (GLMD) method that performs two-stage word prediction distillation and vocabulary compression, which is simple and surprisingly shows extremely strong performance.