Acceleration and Model Compression

Published: 09 Oct 2015 Category: deep_learning


Distilling the Knowledge in a Neural Network

Deep Model Compression: Distilling Knowledge from Noisy Teachers

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning

Like What You Like: Knowledge Distill via Neuron Selectivity Transfer

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

Data-Free Knowledge Distillation for Deep Neural Networks

Knowledge Projection for Deep Neural Networks

Moonshine: Distilling with Cheap Convolutions

model_compression: Implementation of model compression with knowledge distilling method

Neural Network Distiller

Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students

Improving Knowledge Distillation with Supporting Adversarial Samples

Recurrent knowledge distillation

Knowledge Distillation by On-the-Fly Native Ensemble

Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

Correlation Congruence for Knowledge Distillation

Similarity-Preserving Knowledge Distillation

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

Revisit Knowledge Distillation: a Teacher-free Framework

On the Efficacy of Knowledge Distillation

Training convolutional neural networks with cheap convolutions and online distillation

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

  • intro: Xi’an Jiaotong University & Meituan
  • keywords: Knowledge Adjustment (KA), Dynamic Temperature Distillation (DTD)
  • arxiv:

QKD: Quantization-aware Knowledge Distillation

Explaining Knowledge Distillation by Quantifying the Knowledge

Knowledge distillation via adaptive instance normalization

Distillating Knowledge from Graph Convolutional Networks

Regularizing Class-wise Predictions via Self-knowledge Distillation

Online Knowledge Distillation with Diverse Peers

Channel Distillation: Channel-Wise Attention for Knowledge Distillation

Peer Collaborative Learning for Online Knowledge Distillation

Knowledge Distillation for Multi-task Learning

Differentiable Feature Aggregation Search for Knowledge Distillation

Prime-Aware Adaptive Distillation

Knowledge Transfer via Dense Cross-Layer Mutual-Distillation

Matching Guided Distillation

Domain Adaptation Through Task Distillation

Spherical Knowledge Distillation

In Defense of Feature Mimicking for Knowledge Distillation

Online Ensemble Model Compression using Knowledge Distillation

Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation


Awesome Knowledge-Distillation