Acceleration and Model Compression

Published: 09 Oct 2015 Category: deep_learning

Papers

Distilling the Knowledge in a Neural Network

Deep Model Compression: Distilling Knowledge from Noisy Teachers

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning

Like What You Like: Knowledge Distill via Neuron Selectivity Transfer

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

https://arxiv.org/abs/1709.00513

Data-Free Knowledge Distillation for Deep Neural Networks

https://arxiv.org/abs/1710.07535

Knowledge Projection for Deep Neural Networks

https://arxiv.org/abs/1710.09505

Moonshine: Distilling with Cheap Convolutions

https://arxiv.org/abs/1711.02613

model_compression: Implementation of model compression with knowledge distilling method

Neural Network Distiller

Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students

Improving Knowledge Distillation with Supporting Adversarial Samples

https://arxiv.org/abs/1805.05532

Recurrent knowledge distillation

Knowledge Distillation by On-the-Fly Native Ensemble

https://arxiv.org/abs/1806.04606

Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

Correlation Congruence for Knowledge Distillation

Similarity-Preserving Knowledge Distillation

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

https://arxiv.org/pdf/1907.09643.pdf

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

https://arxiv.org/abs/1909.08097

Revisit Knowledge Distillation: a Teacher-free Framework

On the Efficacy of Knowledge Distillation

Training convolutional neural networks with cheap convolutions and online distillation

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

https://arxiv.org/abs/1911.05329

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

  • intro: Xi’an Jiaotong University & Meituan
  • keywords: Knowledge Adjustment (KA), Dynamic Temperature Distillation (DTD)
  • arxiv: https://arxiv.org/abs/1911.07471

QKD: Quantization-aware Knowledge Distillation

https://arxiv.org/abs/1911.12491

Explaining Knowledge Distillation by Quantifying the Knowledge

https://arxiv.org/abs/2003.03622

Knowledge distillation via adaptive instance normalization

https://arxiv.org/abs/2003.04289

Distillating Knowledge from Graph Convolutional Networks

Regularizing Class-wise Predictions via Self-knowledge Distillation

Online Knowledge Distillation with Diverse Peers

Channel Distillation: Channel-Wise Attention for Knowledge Distillation

Peer Collaborative Learning for Online Knowledge Distillation

Knowledge Distillation for Multi-task Learning

Differentiable Feature Aggregation Search for Knowledge Distillation

https://arxiv.org/abs/2008.00506

Prime-Aware Adaptive Distillation

Knowledge Transfer via Dense Cross-Layer Mutual-Distillation

Matching Guided Distillation

Domain Adaptation Through Task Distillation

Spherical Knowledge Distillation

https://arxiv.org/abs/2010.07485

In Defense of Feature Mimicking for Knowledge Distillation

Online Ensemble Model Compression using Knowledge Distillation

Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation

Decoupled Knowledge Distillation

Knowledge Distillation via the Target-aware Transformer

Knowledge Distillation from A Stronger Teacher

Resources

Awesome Knowledge-Distillation

https://github.com/FLHonker/Awesome-Knowledge-Distillation