Acceleration and Model Compression

Published: 09 Oct 2015 Category: deep_learning


High-Performance Neural Networks for Visual Object Classification

Predicting Parameters in Deep Learning

Neurons vs Weights Pruning in Artificial Neural Networks

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

cuDNN: Efficient Primitives for Deep Learning

Efficient and accurate approximations of nonlinear convolutional networks

Convolutional Neural Networks at Constrained Time Cost

Flattened Convolutional Neural Networks for Feedforward Acceleration

Compressing Deep Convolutional Networks using Vector Quantization

  • intro: “this paper showed that vector quantization had a clear advantage over matrix factorization methods in compressing fully-connected layers.”
Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

  • intro: “a low-rank CPdecomposition was adopted to transform a convolutional layer into multiple layers of lower complexity”
Deep Fried Convnets

  • intro: “fully-connected layers were replaced by a single “Fastfood” layer for end-to-end training with convolutional layers”
Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Compressing Neural Networks with the Hashing Trick

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Accelerating Very Deep Convolutional Networks for Classification and Detection

Fast ConvNets Using Group-wise Brain Damage

  • intro: “applied group-wise pruning to the convolutional tensor to decompose it into the multiplications of thinned dense matrices”
Learning both Weights and Connections for Efficient Neural Networks

Data-free parameter pruning for Deep Neural Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Structured Transforms for Small-Footprint Deep Learning

ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines

Reducing the Training Time of Neural Networks by Partitioning

Convolutional neural networks with low-rank regularization

CNNdroid: Open Source Library for GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Convolutional Tables Ensemble: classification in microseconds

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size


Lab41 Reading Group: SqueezeNet


SqueezeNet Keras Dogs vs. Cats demo

Convolutional Neural Networks using Logarithmic Data Representation

DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices

Hardware-oriented Approximation of Convolutional Neural Networks

Deep Neural Networks Under Stress

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

  • intro: “for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).”
Functional Hashing for Compressing Neural Networks

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Learning Structured Sparsity in Deep Neural Networks

Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial “Bottleneck” Structure

Dynamic Network Surgery for Efficient DNNs

Scalable Compression of Deep Neural Networks

Pruning Filters for Efficient ConvNets

Accelerating Deep Convolutional Networks using low-precision and sparsity

Fixed-point Factorized Networks

Ultimate tensorization: compressing convolutional and FC layers alike

Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning

  • intro: “the energy consumption of AlexNet and GoogLeNet are reduced by 3.7x and 1.6x, respectively, with less than 1% top-5 accuracy loss”
Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks

LCNN: Lookup-based Convolutional Neural Network

Deep Tensor Convolution on Multicores

  • intro: present the first practical CPU implementation of tensor convolution optimized for deep networks of small kernels
Training Sparse Neural Networks

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Deep Learning with INT8 Optimization on Xilinx Devices

Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory

An OpenCL(TM) Deep Learning Accelerator on Arria 10

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

Deep Learning with Low Precision by Half-wave Gaussian Quantization

DL-gleaning: An Approach For Improving Inference Speed And Accuracy

Energy Saving Additive Neural Network

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Soft Weight-Sharing for Neural Network Compression

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights

Bayesian Compression for Deep Learning

A Kernel Redundancy Removing Policy for Convolutional Neural Network

Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework

SEP-Nets: Small and Effective Pattern Networks

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

MEC: Memory-efficient Convolution for Deep Neural Network

Data-Driven Sparse Structure Selection for Deep Neural Networks

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

An End-to-End Compression Framework Based on Convolutional Neural Networks

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Domain-adaptive deep network compression

Binary-decomposed DCNN for accelerating computation and compressing model without retraining

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

Improving Efficiency in Convolutional Neural Network with Multilinear Filters

A Survey of Model Compression and Acceleration for Deep Neural Networks

  • intro: IEEE Signal Processing Magazine. IBM Thoms J. Watson Research Center & Tsinghua University & Huazhong University of Science and Technology
Compression-aware Training of Deep Networks

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

Reducing Deep Network Complexity with Fourier Transform Methods

EffNet: An Efficient Structure for Convolutional Neural Networks

Universal Deep Neural Network Compression

Paraphrasing Complex Network: Network Compression via Factor Transfer

Compressing Neural Networks using the Variational Information Bottleneck

Adversarial Network Compression

Expanding a robot’s life: Low power object recognition via FPGA-based DCNN deployment


ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

Neuron Pruning for Compressing Deep Networks using Maxout Architectures

Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization

Prune the Convolutional Neural Networks with Sparse Shrink

NISP: Pruning Networks using Neuron Importance Score Propagation

Automated Pruning for Deep Neural Network Compression

Learning to Prune Filters in Convolutional Neural Networks

Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks

Quantized Neural Networks

Quantized Convolutional Neural Networks for Mobile Devices

Training Quantized Nets: A Deeper Understanding

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Deep Neural Network Compression with Single and Multiple Level Quantization

Binary Convolutional Neural Networks / Binarized Neural Networks

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

Espresso: Efficient Forward Propagation for BCNNs

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Embedded Binarized Neural Networks

Compact Hash Code Learning with Binary Deep Neural Network

Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning

From Hashing to CNNs: Training BinaryWeight Networks via Hashing

Accelerating / Fast Algorithms

Fast Algorithms for Convolutional Neural Networks

Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

Channel Pruning for Accelerating Very Deep Neural Networks

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

Learning Efficient Convolutional Networks through Network Slimming

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

Accelerating Convolutional Neural Networks for Continuous Mobile Vision via Cache Reuse

Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

SBNet: Sparse Blocks Network for Fast Inference

Accelerating deep neural networks with tensor decompositions

A Survey on Acceleration of Deep Convolutional Neural Networks

Recurrent Residual Module for Fast Inference in Videos

Knowledge Distilling / Knowledge Transfer

Distilling the Knowledge in a Neural Network

Deep Model Compression: Distilling Knowledge from Noisy Teachers

Like What You Like: Knowledge Distill via Neuron Selectivity Transfer

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

Data-Free Knowledge Distillation for Deep Neural Networks

Knowledge Projection for Deep Neural Networks

Moonshine: Distilling with Cheap Convolutions

model_compression: Implementation of model compression with knowledge distilling method

Neural Network Distiller

Code Optimization

Production Deep Learning with NVIDIA GPU Inference Engine

speed improvement by merging batch normalization and scale #5

Add a tool to merge ‘Conv-BN-Scale’ into a single ‘Conv’ layer.

Low-memory GEMM-based convolution algorithms for deep neural networks


Accelerate Convolutional Neural Networks


OptNet - reducing memory usage in torch neural networks

NNPACK: Acceleration package for neural networks on multi-core CPUs

Deep Compression on AlexNet

Tiny Darknet

CACU: Calculate deep convolution neurAl network on Cell Unit

keras_compressor: Model Compression CLI Tool for Keras


Neural Networks Are Impressively Good At Compression

“Mobile friendly” deep convolutional neural networks

Lab41 Reading Group: Deep Compression

Accelerating Machine Learning

Compressing and regularizing deep neural networks

Talks / Videos

Deep compression and EIE: Deep learning model compression, design space exploration and hardware acceleration

Deep Compression, DSD Training and EIE: Deep Neural Network Model Compression, Regularization and Hardware Acceleration

Tailoring Convolutional Neural Networks for Low-Cost, Low-Power Implementation