Acceleration and Model Compression

Published: 09 Oct 2015 Category: deep_learning


High-Performance Neural Networks for Visual Object Classification

Predicting Parameters in Deep Learning

Neurons vs Weights Pruning in Artificial Neural Networks

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

cuDNN: Efficient Primitives for Deep Learning

Efficient and accurate approximations of nonlinear convolutional networks

Convolutional Neural Networks at Constrained Time Cost

Flattened Convolutional Neural Networks for Feedforward Acceleration

Compressing Deep Convolutional Networks using Vector Quantization

  • intro: “this paper showed that vector quantization had a clear advantage over matrix factorization methods in compressing fully-connected layers.”
  • arxiv:

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

  • intro: “a low-rank CPdecomposition was adopted to transform a convolutional layer into multiple layers of lower complexity”
  • arxiv:

Deep Fried Convnets

  • intro: “fully-connected layers were replaced by a single “Fastfood” layer for end-to-end training with convolutional layers”
  • arxiv:

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Compressing Neural Networks with the Hashing Trick

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Accelerating Very Deep Convolutional Networks for Classification and Detection

Fast ConvNets Using Group-wise Brain Damage

  • intro: “applied group-wise pruning to the convolutional tensor to decompose it into the multiplications of thinned dense matrices”
  • arxiv:

Learning both Weights and Connections for Efficient Neural Networks

Data-free parameter pruning for Deep Neural Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Structured Transforms for Small-Footprint Deep Learning

ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines

Reducing the Training Time of Neural Networks by Partitioning

Convolutional neural networks with low-rank regularization

CNNdroid: Open Source Library for GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Convolutional Tables Ensemble: classification in microseconds

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size


Lab41 Reading Group: SqueezeNet


SqueezeNet Keras Dogs vs. Cats demo

Convolutional Neural Networks using Logarithmic Data Representation

DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices

Hardware-oriented Approximation of Convolutional Neural Networks

Deep Neural Networks Under Stress

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

  • intro: “for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).”
  • arxiv:

Functional Hashing for Compressing Neural Networks

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Learning Structured Sparsity in Deep Neural Networks

Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial “Bottleneck” Structure

Dynamic Network Surgery for Efficient DNNs

Scalable Compression of Deep Neural Networks

Pruning Filters for Efficient ConvNets

Fixed-point Factorized Networks

Ultimate tensorization: compressing convolutional and FC layers alike

Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning

  • intro: “the energy consumption of AlexNet and GoogLeNet are reduced by 3.7x and 1.6x, respectively, with less than 1% top-5 accuracy loss”
  • arxiv:

Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks

LCNN: Lookup-based Convolutional Neural Network

Deep Tensor Convolution on Multicores

  • intro: present the first practical CPU implementation of tensor convolution optimized for deep networks of small kernels
  • arxiv:

Training Sparse Neural Networks

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Deep Learning with INT8 Optimization on Xilinx Devices

Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory

An OpenCL(TM) Deep Learning Accelerator on Arria 10

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

DL-gleaning: An Approach For Improving Inference Speed And Accuracy

Energy Saving Additive Neural Network

Soft Weight-Sharing for Neural Network Compression

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Bayesian Compression for Deep Learning

A Kernel Redundancy Removing Policy for Convolutional Neural Network

Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework

SEP-Nets: Small and Effective Pattern Networks

MEC: Memory-efficient Convolution for Deep Neural Network

Data-Driven Sparse Structure Selection for Deep Neural Networks

An End-to-End Compression Framework Based on Convolutional Neural Networks

Domain-adaptive deep network compression

Binary-decomposed DCNN for accelerating computation and compressing model without retraining

Improving Efficiency in Convolutional Neural Network with Multilinear Filters

A Survey of Model Compression and Acceleration for Deep Neural Networks

  • intro: IEEE Signal Processing Magazine. IBM Thoms J. Watson Research Center & Tsinghua University & Huazhong University of Science and Technology
  • arxiv:

Compression-aware Training of Deep Networks

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

Reducing Deep Network Complexity with Fourier Transform Methods

EffNet: An Efficient Structure for Convolutional Neural Networks

Universal Deep Neural Network Compression

Paraphrasing Complex Network: Network Compression via Factor Transfer

Compressing Neural Networks using the Variational Information Bottleneck

Adversarial Network Compression

Expanding a robot’s life: Low power object recognition via FPGA-based DCNN deployment

Accelerating CNN inference on FPGAs: A Survey

Doubly Nested Network for Resource-Efficient Inference

Smallify: Learning Network Size while Training

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Cascaded Projection: End-to-End Network Compression and Acceleration

FALCON: Fast and Lightweight Convolution for Compressing and Accelerating CNN

Compressing Deep Neural Network

Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks

ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions

A Framework for Fast and Efficient Neural Network Compression

ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples


ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

Neuron Pruning for Compressing Deep Networks using Maxout Architectures

Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization

Prune the Convolutional Neural Networks with Sparse Shrink

NISP: Pruning Networks using Neuron Importance Score Propagation

Automated Pruning for Deep Neural Network Compression

Learning to Prune Filters in Convolutional Neural Networks

Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks

A novel channel pruning method for deep neural network compression

PCAS: Pruning Channels with Attention Statistics

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

Progressive Deep Neural Networks Acceleration via Soft Filter Pruning

Pruning neural networks: is it time to nip it in the bud?

Rethinking the Value of Network Pruning

Dynamic Channel Pruning: Feature Boosting and Suppression

Interpretable Convolutional Filter Pruning

Progressive Weight Pruning of Deep Neural Networks using ADMM

Pruning Deep Neural Networks using Partial Least Squares

Hybrid Pruning: Thinner Sparse Networks for Fast Inference on Edge Devices

Discrimination-aware Channel Pruning for Deep Neural Networks

Stability Based Filter Pruning for Accelerating Deep CNNs

Structured Pruning for Efficient ConvNets via Incremental Regularization

Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks

A Layer Decomposition-Recomposition Framework for Neuron Pruning towards Accurate Lightweight Networks

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning

Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks

Pruning from Scratch

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

FNNP: Fast Neural Network Pruning Using Adaptive Batch Normalization

Pruning Filter in Filter

Low-Precision Networks

Accelerating Deep Convolutional Networks using low-precision and sparsity

Deep Learning with Low Precision by Half-wave Gaussian Quantization

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

Learning Low Precision Deep Neural Networks through Regularization

Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference

SQuantizer: Simultaneous Learning for Both Sparse and Low-precision Neural Networks

Quantized Neural Networks

Quantized Convolutional Neural Networks for Mobile Devices

Training Quantized Nets: A Deeper Understanding

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Deep Neural Network Compression with Single and Multiple Level Quantization

Quantizing deep convolutional networks for efficient inference: A whitepaper

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

Differentiable Fine-grained Quantization for Deep Neural Network Compression

HAQ: Hardware-Aware Automated Quantization

DNQ: Dynamic Network Quantization

Trained Rank Pruning for Efficient Deep Neural Networks

Training Quantized Network with Auxiliary Gradient Module

FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Bit Efficient Quantization for Deep Neural Networks

Quantization Networks

Adaptive Loss-aware Quantization for Multi-bit Networks

Distribution Adaptive INT8 Quantization for Training CNNs

Distance-aware Quantization

Binary Convolutional Neural Networks / Binarized Neural Networks

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

XNOR-Net++: Improved Binary Neural Networks

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

Espresso: Efficient Forward Propagation for BCNNs

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Embedded Binarized Neural Networks

Compact Hash Code Learning with Binary Deep Neural Network

Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning

From Hashing to CNNs: Training BinaryWeight Networks via Hashing

Energy Efficient Hadamard Neural Networks

  • keywords: Binary Weight and Hadamard-transformed Image Network (BWHIN), Binary Weight Network (BWN), Hadamard-transformed Image Network (HIN)
  • arxiv:

Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?

Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm

Training Compact Neural Networks with Binary Weights and Low Precision Activations

Training wide residual networks for deployment using a single bit for each weight

Composite Binary Decomposition Networks

Training Competitive Binary Neural Networks from Scratch

Regularizing Activation Distribution for Training Binarized Deep Networks

GBCNs: Genetic Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs

Training Binary Neural Networks with Real-to-Binary Convolutions

Accelerating / Fast Algorithms

Fast Algorithms for Convolutional Neural Networks

Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

Channel Pruning for Accelerating Very Deep Neural Networks

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

Learning Efficient Convolutional Networks through Network Slimming

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

Accelerating Convolutional Neural Networks for Continuous Mobile Vision via Cache Reuse

Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

SBNet: Sparse Blocks Network for Fast Inference

Accelerating deep neural networks with tensor decompositions

A Survey on Acceleration of Deep Convolutional Neural Networks

Recurrent Residual Module for Fast Inference in Videos

Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications

Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA

Accelerating Deep Neural Networks with Spatial Bottleneck Modules

FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations

Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

DAC: Data-free Automatic Acceleration of Convolutional Networks

Learning Instance-wise Sparsity for Accelerating Deep Models

Code Optimization

Production Deep Learning with NVIDIA GPU Inference Engine

speed improvement by merging batch normalization and scale #5

Add a tool to merge ‘Conv-BN-Scale’ into a single ‘Conv’ layer.

Low-memory GEMM-based convolution algorithms for deep neural networks


Accelerate Convolutional Neural Networks


OptNet - reducing memory usage in torch neural networks

NNPACK: Acceleration package for neural networks on multi-core CPUs

Deep Compression on AlexNet

Tiny Darknet

CACU: Calculate deep convolution neurAl network on Cell Unit

keras_compressor: Model Compression CLI Tool for Keras


Neural Networks Are Impressively Good At Compression

“Mobile friendly” deep convolutional neural networks

Lab41 Reading Group: Deep Compression

Accelerating Machine Learning

Compressing and regularizing deep neural networks

How fast is my model?

Talks / Videos

Deep compression and EIE: Deep learning model compression, design space exploration and hardware acceleration

Deep Compression, DSD Training and EIE: Deep Neural Network Model Compression, Regularization and Hardware Acceleration

Tailoring Convolutional Neural Networks for Low-Cost, Low-Power Implementation