Acceleration and Model Compression

Published: 09 Oct 2015 Category: deep_learning


High-Performance Neural Networks for Visual Object Classification

Predicting Parameters in Deep Learning

Neurons vs Weights Pruning in Artificial Neural Networks

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

cuDNN: Efficient Primitives for Deep Learning

Efficient and accurate approximations of nonlinear convolutional networks

Convolutional Neural Networks at Constrained Time Cost

Flattened Convolutional Neural Networks for Feedforward Acceleration

Compressing Deep Convolutional Networks using Vector Quantization

  • intro: “this paper showed that vector quantization had a clear advantage over matrix factorization methods in compressing fully-connected layers.”
  • arxiv:

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

  • intro: “a low-rank CPdecomposition was adopted to transform a convolutional layer into multiple layers of lower complexity”
  • arxiv:

Deep Fried Convnets

  • intro: “fully-connected layers were replaced by a single “Fastfood” layer for end-to-end training with convolutional layers”
  • arxiv:

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Compressing Neural Networks with the Hashing Trick

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Accelerating Very Deep Convolutional Networks for Classification and Detection

Fast ConvNets Using Group-wise Brain Damage

  • intro: “applied group-wise pruning to the convolutional tensor to decompose it into the multiplications of thinned dense matrices”
  • arxiv:

Learning both Weights and Connections for Efficient Neural Networks

Data-free parameter pruning for Deep Neural Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Structured Transforms for Small-Footprint Deep Learning

ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines

Reducing the Training Time of Neural Networks by Partitioning

Convolutional neural networks with low-rank regularization

CNNdroid: Open Source Library for GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Convolutional Tables Ensemble: classification in microseconds

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size


Lab41 Reading Group: SqueezeNet


SqueezeNet Keras Dogs vs. Cats demo

Convolutional Neural Networks using Logarithmic Data Representation

DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices

Hardware-oriented Approximation of Convolutional Neural Networks

Deep Neural Networks Under Stress

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

  • intro: “for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).”
  • arxiv:

Functional Hashing for Compressing Neural Networks

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Learning Structured Sparsity in Deep Neural Networks

Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial “Bottleneck” Structure

Dynamic Network Surgery for Efficient DNNs

Scalable Compression of Deep Neural Networks

Pruning Filters for Efficient ConvNets

Accelerating Deep Convolutional Networks using low-precision and sparsity

Deep Model Compression: Distilling Knowledge from Noisy Teachers

Fixed-point Factorized Networks

Ultimate tensorization: compressing convolutional and FC layers alike

Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning

  • intro: “the energy consumption of AlexNet and GoogLeNet are reduced by 3.7x and 1.6x, respectively, with less than 1% top-5 accuracy loss”
  • arxiv:

Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks

LCNN: Lookup-based Convolutional Neural Network

Deep Tensor Convolution on Multicores

  • intro: present the first practical CPU implementation of tensor convolution optimized for deep networks of small kernels
  • arxiv:

Training Sparse Neural Networks

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Deep Learning with INT8 Optimization on Xilinx Devices

Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory

An OpenCL(TM) Deep Learning Accelerator on Arria 10

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

Deep Learning with Low Precision by Half-wave Gaussian Quantization

DL-gleaning: An Approach For Improving Inference Speed And Accuracy

Energy Saving Additive Neural Network

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Soft Weight-Sharing for Neural Network Compression

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights

Bayesian Compression for Deep Learning

A Kernel Redundancy Removing Policy for Convolutional Neural Network

Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework

SEP-Nets: Small and Effective Pattern Networks

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

MEC: Memory-efficient Convolution for Deep Neural Network

Quantized Neural Networks

Quantized Convolutional Neural Networks for Mobile Devices

Training Quantized Nets: A Deeper Understanding

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Binary Convolutional Neural Networks

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

Espresso: Efficient Forward Propagation for BCNNs

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Accelerating / Fast Algorithms

Fast Algorithms for Convolutional Neural Networks

Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

Channel Pruning for Accelerating Very Deep Neural Networks

Knowledge Distilling / Knowledge Transfer

Distilling the Knowledge in a Neural Network

Like What You Like: Knowledge Distill via Neuron Selectivity Transfer

DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer

Code Optimization

Production Deep Learning with NVIDIA GPU Inference Engine

speed improvement by merging batch normalization and scale #5

Add a tool to merge ‘Conv-BN-Scale’ into a single ‘Conv’ layer.


Accelerate Convolutional Neural Networks


OptNet - reducing memory usage in torch neural networks

NNPACK: Acceleration package for neural networks on multi-core CPUs

Deep Compression on AlexNet

Tiny Darknet

CACU: Calculate deep convolution neurAl network on Cell Unit

model_compression: Implementation of model compression with knowledge distilling method

keras_compressor: Model Compression CLI Tool for Keras


Neural Networks Are Impressively Good At Compression

“Mobile friendly” deep convolutional neural networks

Lab41 Reading Group: Deep Compression

Accelerating Machine Learning

Compressing and regularizing deep neural networks

Talks / Videos

Deep compression and EIE: Deep learning model compression, design space exploration and hardware acceleration

Deep Compression, DSD Training and EIE: Deep Neural Network Model Compression, Regularization and Hardware Acceleration

Tailoring Convolutional Neural Networks for Low-Cost, Low-Power Implementation