Acceleration and Model Compression

Published: 09 Oct 2015 Category: deep_learning

Papers

High-Performance Neural Networks for Visual Object Classification

Predicting Parameters in Deep Learning

Neurons vs Weights Pruning in Artificial Neural Networks

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

cuDNN: Efficient Primitives for Deep Learning

Efficient and accurate approximations of nonlinear convolutional networks

Convolutional Neural Networks at Constrained Time Cost

Flattened Convolutional Neural Networks for Feedforward Acceleration

Compressing Deep Convolutional Networks using Vector Quantization

  • intro: “this paper showed that vector quantization had a clear advantage over matrix factorization methods in compressing fully-connected layers.”
  • arxiv: http://arxiv.org/abs/1412.6115

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

  • intro: “a low-rank CPdecomposition was adopted to transform a convolutional layer into multiple layers of lower complexity”
  • arxiv: http://arxiv.org/abs/1412.6553

Deep Fried Convnets

  • intro: “fully-connected layers were replaced by a single “Fastfood” layer for end-to-end training with convolutional layers”
  • arxiv: http://arxiv.org/abs/1412.7149

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Compressing Neural Networks with the Hashing Trick

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Accelerating Very Deep Convolutional Networks for Classification and Detection

Fast ConvNets Using Group-wise Brain Damage

  • intro: “applied group-wise pruning to the convolutional tensor to decompose it into the multiplications of thinned dense matrices”
  • arxiv: http://arxiv.org/abs/1506.02515

Learning both Weights and Connections for Efficient Neural Networks

Data-free parameter pruning for Deep Neural Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Structured Transforms for Small-Footprint Deep Learning

ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines

Reducing the Training Time of Neural Networks by Partitioning

Convolutional neural networks with low-rank regularization

CNNdroid: Open Source Library for GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Convolutional Tables Ensemble: classification in microseconds

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

SqueezeNet-Residual

Lab41 Reading Group: SqueezeNet

https://medium.com/m/global-identity?redirectUrl=https://gab41.lab41.org/lab41-reading-group-squeezenet-9b9d1d754c75

Simplified_SqueezeNet

SqueezeNet Keras Dogs vs. Cats demo

Convolutional Neural Networks using Logarithmic Data Representation

DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices

Hardware-oriented Approximation of Convolutional Neural Networks

Deep Neural Networks Under Stress

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

  • intro: “for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).”
  • arxiv: https://arxiv.org/abs/1605.06489

Functional Hashing for Compressing Neural Networks

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Learning Structured Sparsity in Deep Neural Networks

Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial “Bottleneck” Structure

https://arxiv.org/abs/1608.04337

Dynamic Network Surgery for Efficient DNNs

Scalable Compression of Deep Neural Networks

Pruning Filters for Efficient ConvNets

Fixed-point Factorized Networks

Ultimate tensorization: compressing convolutional and FC layers alike

Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning

  • intro: “the energy consumption of AlexNet and GoogLeNet are reduced by 3.7x and 1.6x, respectively, with less than 1% top-5 accuracy loss”
  • arxiv: https://arxiv.org/abs/1611.05128

Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks

LCNN: Lookup-based Convolutional Neural Network

Deep Tensor Convolution on Multicores

  • intro: present the first practical CPU implementation of tensor convolution optimized for deep networks of small kernels
  • arxiv: https://arxiv.org/abs/1611.06565

Training Sparse Neural Networks

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Deep Learning with INT8 Optimization on Xilinx Devices

Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory

An OpenCL(TM) Deep Learning Accelerator on Arria 10

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

DL-gleaning: An Approach For Improving Inference Speed And Accuracy

Energy Saving Additive Neural Network

Soft Weight-Sharing for Neural Network Compression

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

https://arxiv.org/abs/1704.01137

Bayesian Compression for Deep Learning

https://arxiv.org/abs/1705.08665

A Kernel Redundancy Removing Policy for Convolutional Neural Network

https://arxiv.org/abs/1705.10748

Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework

SEP-Nets: Small and Effective Pattern Networks

MEC: Memory-efficient Convolution for Deep Neural Network

Data-Driven Sparse Structure Selection for Deep Neural Networks

https://arxiv.org/abs/1707.01213

An End-to-End Compression Framework Based on Convolutional Neural Networks

https://arxiv.org/abs/1708.00838

Domain-adaptive deep network compression

Binary-decomposed DCNN for accelerating computation and compressing model without retraining

https://arxiv.org/abs/1709.04731

Improving Efficiency in Convolutional Neural Network with Multilinear Filters

https://arxiv.org/abs/1709.09902

A Survey of Model Compression and Acceleration for Deep Neural Networks

  • intro: IEEE Signal Processing Magazine. IBM Thoms J. Watson Research Center & Tsinghua University & Huazhong University of Science and Technology
  • arxiv: https://arxiv.org/abs/1710.09282

Compression-aware Training of Deep Networks

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

https://arxiv.org/abs/1711.06528

Reducing Deep Network Complexity with Fourier Transform Methods

EffNet: An Efficient Structure for Convolutional Neural Networks

Universal Deep Neural Network Compression

https://arxiv.org/abs/1802.02271

Paraphrasing Complex Network: Network Compression via Factor Transfer

https://arxiv.org/abs/1802.04977

Compressing Neural Networks using the Variational Information Bottleneck

Adversarial Network Compression

https://arxiv.org/abs/1803.10750

Expanding a robot’s life: Low power object recognition via FPGA-based DCNN deployment

Accelerating CNN inference on FPGAs: A Survey

Doubly Nested Network for Resource-Efficient Inference

https://arxiv.org/abs/1806.07568

Smallify: Learning Network Size while Training

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Cascaded Projection: End-to-End Network Compression and Acceleration

https://arxiv.org/abs/1903.04988

FALCON: Fast and Lightweight Convolution for Compressing and Accelerating CNN

https://arxiv.org/abs/1909.11321

Compressing Deep Neural Network

Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks

https://arxiv.org/abs/1808.05240

ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions

A Framework for Fast and Efficient Neural Network Compression

https://arxiv.org/abs/1811.12781

ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples

https://arxiv.org/abs/1811.12673

Pruning

ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

Neuron Pruning for Compressing Deep Networks using Maxout Architectures

Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization

Prune the Convolutional Neural Networks with Sparse Shrink

https://arxiv.org/abs/1708.02439

NISP: Pruning Networks using Neuron Importance Score Propagation

Automated Pruning for Deep Neural Network Compression

https://arxiv.org/abs/1712.01721

Learning to Prune Filters in Convolutional Neural Networks

https://arxiv.org/abs/1801.07365

Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks

A novel channel pruning method for deep neural network compression

https://arxiv.org/abs/1805.11394

PCAS: Pruning Channels with Attention Statistics

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

Progressive Deep Neural Networks Acceleration via Soft Filter Pruning

https://arxiv.org/abs/1808.07471

Pruning neural networks: is it time to nip it in the bud?

https://arxiv.org/abs/1810.04622

Rethinking the Value of Network Pruning

https://arxiv.org/abs/1810.05270

Dynamic Channel Pruning: Feature Boosting and Suppression

https://arxiv.org/abs/1810.05331

Interpretable Convolutional Filter Pruning

https://arxiv.org/abs/1810.07322

Progressive Weight Pruning of Deep Neural Networks using ADMM

https://arxiv.org/abs/1810.07378

Pruning Deep Neural Networks using Partial Least Squares

Hybrid Pruning: Thinner Sparse Networks for Fast Inference on Edge Devices

https://arxiv.org/abs/1811.00482

Discrimination-aware Channel Pruning for Deep Neural Networks

Stability Based Filter Pruning for Accelerating Deep CNNs

Structured Pruning for Efficient ConvNets via Incremental Regularization

Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks

https://arxiv.org/abs/1811.08589

A Layer Decomposition-Recomposition Framework for Neuron Pruning towards Accurate Lightweight Networks

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

https://arxiv.org/abs/1812.11337

Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning

https://arxiv.org/abs/1901.07827

Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks

https://arxiv.org/abs/1901.11391

Pruning from Scratch

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

FNNP: Fast Neural Network Pruning Using Adaptive Batch Normalization

Pruning Filter in Filter

Low-Precision Networks

Accelerating Deep Convolutional Networks using low-precision and sparsity

Deep Learning with Low Precision by Half-wave Gaussian Quantization

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

Learning Low Precision Deep Neural Networks through Regularization

https://arxiv.org/abs/1809.00095

Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference

https://arxiv.org/abs/1809.04191

SQuantizer: Simultaneous Learning for Both Sparse and Low-precision Neural Networks

Quantized Neural Networks

Quantized Convolutional Neural Networks for Mobile Devices

Training Quantized Nets: A Deeper Understanding

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Deep Neural Network Compression with Single and Multiple Level Quantization

Quantizing deep convolutional networks for efficient inference: A whitepaper

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

https://arxiv.org/abs/1808.05779

Differentiable Fine-grained Quantization for Deep Neural Network Compression

https://arxiv.org/abs/1810.10351

HAQ: Hardware-Aware Automated Quantization

https://arxiv.org/abs/1811.08886

DNQ: Dynamic Network Quantization

Trained Rank Pruning for Efficient Deep Neural Networks

Training Quantized Network with Auxiliary Gradient Module

FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Bit Efficient Quantization for Deep Neural Networks

Quantization Networks

Adaptive Loss-aware Quantization for Multi-bit Networks

https://arxiv.org/abs/1912.08883

Distribution Adaptive INT8 Quantization for Training CNNs

Distance-aware Quantization

Binary Convolutional Neural Networks / Binarized Neural Networks

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

https://arxiv.org/abs/1602.02830

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

https://arxiv.org/abs/1609.07061

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

XNOR-Net++: Improved Binary Neural Networks

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

https://arxiv.org/abs/1606.06160

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

Espresso: Efficient Forward Propagation for BCNNs

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

https://arxiv.org/abs/1706.02393

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Embedded Binarized Neural Networks

https://arxiv.org/abs/1709.02260

Compact Hash Code Learning with Binary Deep Neural Network

Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning

https://arxiv.org/abs/1802.00904

From Hashing to CNNs: Training BinaryWeight Networks via Hashing

https://arxiv.org/abs/1802.02733

Energy Efficient Hadamard Neural Networks

  • keywords: Binary Weight and Hadamard-transformed Image Network (BWHIN), Binary Weight Network (BWN), Hadamard-transformed Image Network (HIN)
  • arxiv: https://arxiv.org/abs/1805.05421

Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?

https://arxiv.org/abs/1806.07550

Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm

Training Compact Neural Networks with Binary Weights and Low Precision Activations

https://arxiv.org/abs/1808.02631

Training wide residual networks for deployment using a single bit for each weight

Composite Binary Decomposition Networks

https://arxiv.org/abs/1811.06668

Training Competitive Binary Neural Networks from Scratch

Regularizing Activation Distribution for Training Binarized Deep Networks

GBCNs: Genetic Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs

Training Binary Neural Networks with Real-to-Binary Convolutions

Accelerating / Fast Algorithms

Fast Algorithms for Convolutional Neural Networks

Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

https://arxiv.org/abs/1706.01406

Channel Pruning for Accelerating Very Deep Neural Networks

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

https://arxiv.org/abs/1708.04728

Learning Efficient Convolutional Networks through Network Slimming

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

https://arxiv.org/abs/1711.06315

Accelerating Convolutional Neural Networks for Continuous Mobile Vision via Cache Reuse

Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

SBNet: Sparse Blocks Network for Fast Inference

Accelerating deep neural networks with tensor decompositions

A Survey on Acceleration of Deep Convolutional Neural Networks

https://arxiv.org/abs/1802.00939

Recurrent Residual Module for Fast Inference in Videos

Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications

Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA

https://arxiv.org/abs/1809.03318

Accelerating Deep Neural Networks with Spatial Bottleneck Modules

https://arxiv.org/abs/1809.02601

FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations

https://arxiv.org/abs/1808.09945

Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

https://arxiv.org/abs/1810.03979

DAC: Data-free Automatic Acceleration of Convolutional Networks

Learning Instance-wise Sparsity for Accelerating Deep Models

Code Optimization

Production Deep Learning with NVIDIA GPU Inference Engine

speed improvement by merging batch normalization and scale #5

Add a tool to merge ‘Conv-BN-Scale’ into a single ‘Conv’ layer.

https://github.com/sanghoon/pva-faster-rcnn/commit/39570aab8c6513f0e76e5ab5dba8dfbf63e9c68c/

Low-memory GEMM-based convolution algorithms for deep neural networks

https://arxiv.org/abs/1709.03395

Projects

Accelerate Convolutional Neural Networks

OptNet

OptNet - reducing memory usage in torch neural networks

NNPACK: Acceleration package for neural networks on multi-core CPUs

Deep Compression on AlexNet

Tiny Darknet

CACU: Calculate deep convolution neurAl network on Cell Unit

keras_compressor: Model Compression CLI Tool for Keras

Blogs

Neural Networks Are Impressively Good At Compression

https://probablydance.com/2016/04/30/neural-networks-are-impressively-good-at-compression/

“Mobile friendly” deep convolutional neural networks

Lab41 Reading Group: Deep Compression

Accelerating Machine Learning

Compressing and regularizing deep neural networks

https://www.oreilly.com/ideas/compressing-and-regularizing-deep-neural-networks

How fast is my model?

http://machinethink.net/blog/how-fast-is-my-model/

Talks / Videos

Deep compression and EIE: Deep learning model compression, design space exploration and hardware acceleration

Deep Compression, DSD Training and EIE: Deep Neural Network Model Compression, Regularization and Hardware Acceleration

http://research.microsoft.com/apps/video/default.aspx?id=266664

Tailoring Convolutional Neural Networks for Low-Cost, Low-Power Implementation

Resources

awesome-model-compression-and-acceleration

https://github.com/sun254/awesome-model-compression-and-acceleration

Embedded-Neural-Network