Acceleration and Model Compression

Published: 09 Oct 2015 Category: deep_learning

Papers

High-Performance Neural Networks for Visual Object Classification

Predicting Parameters in Deep Learning

Neurons vs Weights Pruning in Artificial Neural Networks

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

cuDNN: Efficient Primitives for Deep Learning

Efficient and accurate approximations of nonlinear convolutional networks

Convolutional Neural Networks at Constrained Time Cost

Flattened Convolutional Neural Networks for Feedforward Acceleration

Compressing Deep Convolutional Networks using Vector Quantization

  • intro: “this paper showed that vector quantization had a clear advantage over matrix factorization methods in compressing fully-connected layers.”
  • arxiv: http://arxiv.org/abs/1412.6115

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

  • intro: “a low-rank CPdecomposition was adopted to transform a convolutional layer into multiple layers of lower complexity”
  • arxiv: http://arxiv.org/abs/1412.6553

Deep Fried Convnets

  • intro: “fully-connected layers were replaced by a single “Fastfood” layer for end-to-end training with convolutional layers”
  • arxiv: http://arxiv.org/abs/1412.7149

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Compressing Neural Networks with the Hashing Trick

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Accelerating Very Deep Convolutional Networks for Classification and Detection

Fast ConvNets Using Group-wise Brain Damage

  • intro: “applied group-wise pruning to the convolutional tensor to decompose it into the multiplications of thinned dense matrices”
  • arxiv: http://arxiv.org/abs/1506.02515

Learning both Weights and Connections for Efficient Neural Networks

Data-free parameter pruning for Deep Neural Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Structured Transforms for Small-Footprint Deep Learning

ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines

Reducing the Training Time of Neural Networks by Partitioning

Convolutional neural networks with low-rank regularization

CNNdroid: Open Source Library for GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Convolutional Tables Ensemble: classification in microseconds

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

SqueezeNet-Residual

Lab41 Reading Group: SqueezeNet

https://medium.com/m/global-identity?redirectUrl=https://gab41.lab41.org/lab41-reading-group-squeezenet-9b9d1d754c75

Simplified_SqueezeNet

SqueezeNet Keras Dogs vs. Cats demo

Convolutional Neural Networks using Logarithmic Data Representation

DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices

Hardware-oriented Approximation of Convolutional Neural Networks

Deep Neural Networks Under Stress

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

  • intro: “for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).”
  • arxiv: https://arxiv.org/abs/1605.06489

Functional Hashing for Compressing Neural Networks

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Learning Structured Sparsity in Deep Neural Networks

Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial “Bottleneck” Structure

https://arxiv.org/abs/1608.04337

Dynamic Network Surgery for Efficient DNNs

Scalable Compression of Deep Neural Networks

Pruning Filters for Efficient ConvNets

Accelerating Deep Convolutional Networks using low-precision and sparsity

Fixed-point Factorized Networks

Ultimate tensorization: compressing convolutional and FC layers alike

Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning

  • intro: “the energy consumption of AlexNet and GoogLeNet are reduced by 3.7x and 1.6x, respectively, with less than 1% top-5 accuracy loss”
  • arxiv: https://arxiv.org/abs/1611.05128

Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks

LCNN: Lookup-based Convolutional Neural Network

Deep Tensor Convolution on Multicores

  • intro: present the first practical CPU implementation of tensor convolution optimized for deep networks of small kernels
  • arxiv: https://arxiv.org/abs/1611.06565

Training Sparse Neural Networks

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Deep Learning with INT8 Optimization on Xilinx Devices

Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory

An OpenCL(TM) Deep Learning Accelerator on Arria 10

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

Deep Learning with Low Precision by Half-wave Gaussian Quantization

DL-gleaning: An Approach For Improving Inference Speed And Accuracy

Energy Saving Additive Neural Network

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Soft Weight-Sharing for Neural Network Compression

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

https://arxiv.org/abs/1704.01137

Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights

https://openreview.net/forum?id=HyQJ-mclg&noteId=HyQJ-mclg

Bayesian Compression for Deep Learning

https://arxiv.org/abs/1705.08665

A Kernel Redundancy Removing Policy for Convolutional Neural Network

https://arxiv.org/abs/1705.10748

Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework

SEP-Nets: Small and Effective Pattern Networks

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

MEC: Memory-efficient Convolution for Deep Neural Network

Data-Driven Sparse Structure Selection for Deep Neural Networks

https://arxiv.org/abs/1707.01213

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

An End-to-End Compression Framework Based on Convolutional Neural Networks

https://arxiv.org/abs/1708.00838

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Domain-adaptive deep network compression

Binary-decomposed DCNN for accelerating computation and compressing model without retraining

https://arxiv.org/abs/1709.04731

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

Improving Efficiency in Convolutional Neural Network with Multilinear Filters

https://arxiv.org/abs/1709.09902

A Survey of Model Compression and Acceleration for Deep Neural Networks

  • intro: IEEE Signal Processing Magazine. IBM Thoms J. Watson Research Center & Tsinghua University & Huazhong University of Science and Technology
  • arxiv: https://arxiv.org/abs/1710.09282

Compression-aware Training of Deep Networks

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

https://arxiv.org/abs/1711.06528

Pruning

ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

Neuron Pruning for Compressing Deep Networks using Maxout Architectures

Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization

Prune the Convolutional Neural Networks with Sparse Shrink

https://arxiv.org/abs/1708.02439

NISP: Pruning Networks using Neuron Importance Score Propagation

Quantized Neural Networks

Quantized Convolutional Neural Networks for Mobile Devices

Training Quantized Nets: A Deeper Understanding

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Binary Convolutional Neural Networks / Binarized Neural Networks

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

https://arxiv.org/abs/1602.02830

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

https://arxiv.org/abs/1609.07061

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

Espresso: Efficient Forward Propagation for BCNNs

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

https://arxiv.org/abs/1706.02393

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Embedded Binarized Neural Networks

https://arxiv.org/abs/1709.02260

Accelerating / Fast Algorithms

Fast Algorithms for Convolutional Neural Networks

Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

https://arxiv.org/abs/1706.01406

Channel Pruning for Accelerating Very Deep Neural Networks

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

https://arxiv.org/abs/1708.04728

Learning Efficient Convolutional Networks through Network Slimming

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

https://arxiv.org/abs/1711.06315

Knowledge Distilling / Knowledge Transfer

Distilling the Knowledge in a Neural Network

Deep Model Compression: Distilling Knowledge from Noisy Teachers

Like What You Like: Knowledge Distill via Neuron Selectivity Transfer

DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

https://arxiv.org/abs/1709.00513

Data-Free Knowledge Distillation for Deep Neural Networks

https://arxiv.org/abs/1710.07535

Knowledge Projection for Deep Neural Networks

https://arxiv.org/abs/1710.09505

Moonshine: Distilling with Cheap Convolutions

https://arxiv.org/abs/1711.02613

model_compression: Implementation of model compression with knowledge distilling method

Code Optimization

Production Deep Learning with NVIDIA GPU Inference Engine

speed improvement by merging batch normalization and scale #5

Add a tool to merge ‘Conv-BN-Scale’ into a single ‘Conv’ layer.

https://github.com/sanghoon/pva-faster-rcnn/commit/39570aab8c6513f0e76e5ab5dba8dfbf63e9c68c/

Low-memory GEMM-based convolution algorithms for deep neural networks

https://arxiv.org/abs/1709.03395

Projects

Accelerate Convolutional Neural Networks

OptNet

OptNet - reducing memory usage in torch neural networks

NNPACK: Acceleration package for neural networks on multi-core CPUs

Deep Compression on AlexNet

Tiny Darknet

CACU: Calculate deep convolution neurAl network on Cell Unit

keras_compressor: Model Compression CLI Tool for Keras

Blogs

Neural Networks Are Impressively Good At Compression

https://probablydance.com/2016/04/30/neural-networks-are-impressively-good-at-compression/

“Mobile friendly” deep convolutional neural networks

Lab41 Reading Group: Deep Compression

Accelerating Machine Learning

Compressing and regularizing deep neural networks

https://www.oreilly.com/ideas/compressing-and-regularizing-deep-neural-networks

Talks / Videos

Deep compression and EIE: Deep learning model compression, design space exploration and hardware acceleration

Deep Compression, DSD Training and EIE: Deep Neural Network Model Compression, Regularization and Hardware Acceleration

http://research.microsoft.com/apps/video/default.aspx?id=266664

Tailoring Convolutional Neural Networks for Low-Cost, Low-Power Implementation

Resources

Embedded-Neural-Network