Acceleration and Model Compression
Papers
Distilling the Knowledge in a Neural Network
- intro: NIPS 2014 Deep Learning Workshop
- author: Geoffrey Hinton, Oriol Vinyals, Jeff Dean
- arxiv: http://arxiv.org/abs/1503.02531
- blog: http://fastml.com/geoff-hintons-dark-knowledge/
- notes: https://github.com/dennybritz/deeplearning-papernotes/blob/master/notes/distilling-the-knowledge-in-a-nn.md
Deep Model Compression: Distilling Knowledge from Noisy Teachers
- arxiv: https://arxiv.org/abs/1610.09650
- github: https://github.com/chengshengchan/model_compression]
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
- intro: CVPR 2017
- paper: http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- intro: TuSimple
- arxiv: https://arxiv.org/abs/1707.01219
- github: https://github.com/TuSimple/neuron-selectivity-transfer
Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
https://arxiv.org/abs/1709.00513
Data-Free Knowledge Distillation for Deep Neural Networks
https://arxiv.org/abs/1710.07535
Knowledge Projection for Deep Neural Networks
https://arxiv.org/abs/1710.09505
Moonshine: Distilling with Cheap Convolutions
https://arxiv.org/abs/1711.02613
model_compression: Implementation of model compression with knowledge distilling method
Neural Network Distiller
- intro: Neural Network Distiller: a Python package for neural network compression research
- project page: https://nervanasystems.github.io/distiller/
- github: https://github.com/NervanaSystems/distiller
Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students
- intro: The Johns Hopkins University
- arxiv: https://arxiv.org/abs/1805.05551
Improving Knowledge Distillation with Supporting Adversarial Samples
https://arxiv.org/abs/1805.05532
Recurrent knowledge distillation
- intro: ICIP 2018
- arxiv: https://arxiv.org/abs/1805.07170
Knowledge Distillation by On-the-Fly Native Ensemble
https://arxiv.org/abs/1806.04606
Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher
- intro: Washington State University & DeepMind
- arxiv: https://arxiv.org/abs/1902.03393
Correlation Congruence for Knowledge Distillation
- intro: NUDT & SenseTime & BUAA & CUHK
- keywords: Correlation Congruence Knowledge Distillation (CCKD)
- arxiv: https://arxiv.org/abs/1904.01802
Similarity-Preserving Knowledge Distillation
- intro: ICCV 2019
- arxiv: https://arxiv.org/abs/1907.09682
Highlight Every Step: Knowledge Distillation via Collaborative Teaching
https://arxiv.org/pdf/1907.09643.pdf
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
https://arxiv.org/abs/1909.08097
Revisit Knowledge Distillation: a Teacher-free Framework
- arxiv: https://arxiv.org/abs/1909.11723
- github: https://github.com/yuanli2333/Teacher-free-Knowledge-Distillation
On the Efficacy of Knowledge Distillation
- intro: Cornell University
- arxiv: https://arxiv.org/abs/1910.01348
Training convolutional neural networks with cheap convolutions and online distillation
- arxiv: https://arxiv.org/abs/1909.13063
- github: https://github.com/EthanZhangYC/OD-cheap-convolution
Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation
https://arxiv.org/abs/1911.05329
Preparing Lessons: Improve Knowledge Distillation with Better Supervision
- intro: Xi’an Jiaotong University & Meituan
- keywords: Knowledge Adjustment (KA), Dynamic Temperature Distillation (DTD)
- arxiv: https://arxiv.org/abs/1911.07471
QKD: Quantization-aware Knowledge Distillation
https://arxiv.org/abs/1911.12491
Explaining Knowledge Distillation by Quantifying the Knowledge
https://arxiv.org/abs/2003.03622
Knowledge distillation via adaptive instance normalization
https://arxiv.org/abs/2003.04289
Distillating Knowledge from Graph Convolutional Networks
- intro: CVPR 2020
- arxiv: https://arxiv.org/abs/2003.10477
Regularizing Class-wise Predictions via Self-knowledge Distillation
- intro: CVPR 2020
- arxiv: https://arxiv.org/abs/2003.13964
Online Knowledge Distillation with Diverse Peers
- intro: AAAI 2020
- arxiv: https://arxiv.org/abs/1912.00350
- github: https://github.com/DefangChen/OKDDip
Channel Distillation: Channel-Wise Attention for Knowledge Distillation
Peer Collaborative Learning for Online Knowledge Distillation
- intro: Queen Mary University of London
- arxiv: https://arxiv.org/abs/2006.04147
Knowledge Distillation for Multi-task Learning
- intro: University of Edinburgh
- arxiv: https://arxiv.org/abs/2007.06889
Differentiable Feature Aggregation Search for Knowledge Distillation
https://arxiv.org/abs/2008.00506
Prime-Aware Adaptive Distillation
- intro: ECCV 2020
- arxiv: https://arxiv.org/abs/2008.01458
Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
- intro: ECCV 2020
- arxiv: https://arxiv.org/abs/2008.07816
- github: https://github.com/sundw2014/DCM
Matching Guided Distillation
- intro: ECCV 2020
- intro: Aibee Inc.
- project page: http://kaiyuyue.com/mgd/
- arxiv: https://arxiv.org/abs/2008.09958
Domain Adaptation Through Task Distillation
- intro: ECCV 2020
- arxiv: https://arxiv.org/abs/2008.11911
- github: https://github.com/bradyz/task-distillation
Spherical Knowledge Distillation
https://arxiv.org/abs/2010.07485
In Defense of Feature Mimicking for Knowledge Distillation
- intro: Nanjing University
- arxiv: https://arxiv.org/abs/2011.01424
Online Ensemble Model Compression using Knowledge Distillation
- intro: CMU
- arxiv: https://arxiv.org/abs/2011.07449
Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation
- intro: CVPR 2021
- arxiv: https://arxiv.org/abs/2103.08273
- github(Pytorch): https://github.com/MingiJi/FRSKD
Decoupled Knowledge Distillation
- intro: CVPR 2022
- intro: MEGVII Technology & Waseda University & Tsinghua University
- arxiv: https://arxiv.org/abs/2203.08679
Knowledge Distillation via the Target-aware Transformer
- intro: CVPR 2022 Oral
- intro: RMIT University & Alibaba Group & ReLER & Sun Yat-sen University
- arxiv: https://arxiv.org/abs/2205.10793
Knowledge Distillation from A Stronger Teacher
- intro: SenseTime Research & The University of Sydney & University of Science and Technology of China
- arxiv: https://arxiv.org/abs/2205.10536
- github: https://github.com/hunto/DIST_KD
Resources
Awesome Knowledge-Distillation
https://github.com/FLHonker/Awesome-Knowledge-Distillation