Classification / Recognition
Papers
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
- auothor: Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell
- arxiv: http://arxiv.org/abs/1310.1531
CNN Features off-the-shelf: an Astounding Baseline for Recognition
- intro: CVPR 2014
- arxiv: http://arxiv.org/abs/1403.6382
HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification
HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition
- intro: ICCV 2015
- intro: introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy
- project page: https://sites.google.com/site/homepagezhichengyan/home/hdcnn
- arxiv: https://arxiv.org/abs/1410.0736
- code: https://sites.google.com/site/homepagezhichengyan/home/hdcnn/code
- github: https://github.com/stephenyan1231/caffe-public/tree/hdcnn
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
- intro: ImageNet top-5 error: 4.94%
- arxiv: http://arxiv.org/abs/1502.01852
- notes: http://blog.csdn.net/happynear/article/details/45440811
Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks
Deep Convolutional Networks on the Pitch Spiral for Musical Instrument Recognition
- paper: https://github.com/lostanlen/ismir2016/blob/master/paper/lostanlen_ismir2016.pdf
- github: https://github.com/lostanlen/ismir2016
Humans and deep networks largely agree on which kinds of variation make object recognition harder
- arxiv: http://arxiv.org/abs/1604.06486
- review: https://www.technologyreview.com/s/601387/why-machine-vision-is-flawed-in-the-same-way-as-human-vision/
FusionNet: 3D Object Classification Using Multiple Data Representations
From image recognition to object recognition
Deep FisherNet for Object Classification
Factorized Bilinear Models for Image Recognition
- intro: TuSimple
- arxiv: https://arxiv.org/abs/1611.05709
- github(MXNet): https://github.com/lyttonhao/Factorized-Bilinear-Network
Hyperspectral CNN Classification with Limited Training Samples
The More You Know: Using Knowledge Graphs for Image Classification
- intro: CMU. GSNN
- arxiv: https://arxiv.org/abs/1612.04844
MaxMin Convolutional Neural Networks for Image Classification
- paper: http://webia.lip6.fr/~thomen/papers/Blot_ICIP_2016.pdf
- github: https://github.com/karandesai-96/maxmin-cnn
Cost-Effective Active Learning for Deep Image Classification
- intro: TCSVT 2016
- intro: Sun Yat-sen University & Guangzhou University
- arxiv: https://arxiv.org/abs/1701.03551
Deep Collaborative Learning for Visual Recognition
https://www.arxiv.org/abs/1703.01229
Convolutional Low-Resolution Fine-Grained Classification
https://arxiv.org/abs/1703.05393
Multi-Scale Dense Networks for Resource Efficient Image Classification
- intro: Cornell University & Fudan University & Tsinghua University & Facebook AI Research
- arxiv: https://arxiv.org/abs/1703.09844
- github: https://github.com//gaohuang/MSDNet
Deep Mixture of Diverse Experts for Large-Scale Visual Recognition
https://arxiv.org/abs/1706.07901
Sunrise or Sunset: Selective Comparison Learning for Subtle Attribute Recognition
- intro: BMVC 2017
- arxiv: https://arxiv.org/abs/1707.06335
Why Do Deep Neural Networks Still Not Recognize These Images?: A Qualitative Analysis on Failure Cases of ImageNet Classification
- intro: Poster presented at CVPR 2017 Scene Understanding Workshop
- arxiv: https://arxiv.org/abs/1709.03439
B-CNN: Branch Convolutional Neural Network for Hierarchical Classification
https://arxiv.org/abs/1709.09890
Learning Transferable Architectures for Scalable Image Recognition
- intro: Google Brain
- keywords: Neural Architecture Search
- arxiv: https://arxiv.org/abs/1707.07012
AOGNets: Deep AND-OR Grammar Networks for Visual Recognition
https://arxiv.org/abs/1711.05847
Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN
- intro: University of Southern California & Google Research
- arxiv: https://arxiv.org/abs/1711.07607
Between-class Learning for Image Classification
- intro: The University of Tokyo & RIKEN
- arxiv: https://arxiv.org/abs/1711.10284
Efficient Traffic-Sign Recognition with Scale-aware CNN
- intro: BMVC 2017
- arxiv: https://arxiv.org/abs/1805.12289
Co-domain Embedding using Deep Quadruplet Networks for Unseen Traffic Sign Recognition
- intro: AAAI 2018
- arix:vhttps://arxiv.org/abs/1712.01907
µNet: A Highly Compact Deep Convolutional Neural Network Architecture for Real-time Embedded Traffic Sign Classification
https://arxiv.org/abs/1804.00497
Deep Predictive Coding Network for Object Recognition
https://arxiv.org/abs/1802.04762
Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
- intro: CVPR 2018. The Robotics Institute, Carnegie Mellon University
- arxiv: https://arxiv.org/abs/1803.08035
Attention-based Pyramid Aggregation Network for Visual Place Recognition
- intro: ACM MM 2018
- arxiv: https://arxiv.org/abs/1808.00288
How do Convolutional Neural Networks Learn Design?
- intro: ICPR 2018
- arxiv: https://arxiv.org/abs/1808.08402
Making Classification Competitive for Deep Metric Learning
https://arxiv.org/abs/1811.12649
In Defense of the Triplet Loss for Visual Recognition
- intro: University of Maryland & Honda Research Institute
- arxiv: https://arxiv.org/abs/1901.08616
All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
- intro: CVPR 2019
- arxiv: https://arxiv.org/abs/1903.05285
Deep CNN-based Multi-task Learning for Open-Set Recognition
https://arxiv.org/abs/1903.03161
Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks
https://arxiv.org/abs/1611.05916
Large-Scale Long-Tailed Recognition in an Open World
- intro: CVPR 2019 oral
- intro: CUHK & UC Berkeley / ICSI
- project page: https://liuziwei7.github.io/projects/LongTail.html
- arxiv: https://arxiv.org/abs/1904.05160
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- intro: Google Research, Brain Team
- arxiv: https://arxiv.org/abs/2010.11929
- github: https://github.com/google-research/vision_transformer
High-Performance Large-Scale Image Recognition Without Normalization
- intro: NFNet
- arxiv: https://arxiv.org/abs/2102.06171
- github: https://github.com/deepmind/deepmind-research/tree/master/nfnets
Massive Classification
Accelerated Training for Massive Classification via Dynamic Class Selection
- intro: AAAI 2018. CUHK & SenseTime
- keywords: HF-Softmax
- arxiv: https://arxiv.org/abs/1801.01687
- github: https://github.com/yl-1993/hfsoftmax
Multi-object Recognition
Multiple Object Recognition with Visual Attention
- keyword: deep recurrent neural network, reinforcement learning
- arxiv: https://arxiv.org/abs/1412.7755
- github: https://github.com/jrbtaylor/visual-attention
Multiple Instance Learning Convolutional Neural Networks for Object Recognition
- intro: ICPR 2016 Oral
- arxiv: https://arxiv.org/abs/1610.03155
Multi-Label Classification
Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification
- intro: CVPR 2017
- intro: University of Science and Technology of China & CUHK
- arxiv: https://arxiv.org/abs/1702.05891
- github(official. Caffe): https://github.com/zhufengx/SRN_multilabel/
Order-Free RNN with Visual Attention for Multi-Label Classification
https://arxiv.org/abs/1707.05495
Learning Social Image Embedding with Deep Multimodal Attention Networks
- intro: Beihang University & Microsoft Research
- arxiv: https://arxiv.org/abs/1710.06582
Multi-label Image Recognition by Recurrently Discovering Attentional Regions
- intro: ICCV 2017
- arxiv: https://arxiv.org/abs/1711.02816
Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition
- intro: AAAI 2018
- arxiv: https://arxiv.org/abs/1712.07465
A Baseline for Multi-Label Image Classification Using Ensemble Deep CNN
https://arxiv.org/abs/1811.08412
Multi-class Classification without Multi-class Labels
- intro: ICLR 2019
- arxiv: https://arxiv.org/abs/1901.00544
Learning a Deep ConvNet for Multi-label Classification with Partial Labels
- intro: CVPR 2019
- arxiv: https://arxiv.org/abs/1902.09720
Multi-Label Image Recognition with Graph Convolutional Networks
- intro: CVPR 2019
- arxiv: https://arxiv.org/abs/1904.03582
- github: https://github.com/chenzhaomin123/ML_GCN
General Multi-label Image Classification with Transformers
- intro: University of Virginia
- arxiv: https://arxiv.org/abs/2011.14027
Person Recognition
Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues
- intro: UC Berkeley & Facebook AI Research
- keywords: People In Photo Albums (PIPA) dataset, Pose Invariant PErson Recognition (PIPER)
- project page: https://people.eecs.berkeley.edu/~nzhang/piper.html
- arxiv: https://arxiv.org/abs/1501.05703
COCO_v1
Learning Deep Features via Congenerous Cosine Loss for Person Recognition
- keywords: COCO loss
- arxiv: https://arxiv.org/abs/1702.06890
- github: https://github.com/sciencefans/coco_loss
Pose-Aware Person Recognition
- intro: CVIT & Facebook AI Research
- arxiv: https://arxiv.org/abs/1705.10120
COCO_v2
Rethinking Feature Discrimination and Polymerization for Large-scale Recognition
- intro: NIPS 2017 Deep Learning Workshop
- keywords: COCO loss
- arxiv: https://arxiv.org/abs/1710.00870
- github: https://github.com/sciencefans/coco_loss
Person Recognition in Social Media Photos
https://arxiv.org/abs/1710.03224
Unifying Identification and Context Learning for Person Recognition
- intro: CVPR 2018
- paper: http://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Unifying_Identification_and_CVPR_2018_paper.pdf
Fine-grained Recognition
Bilinear CNN Models for Fine-grained Visual Recognition
- intro: ICCV 2015
- homepage: http://vis-www.cs.umass.edu/bcnn/
- paper: http://vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf
- arxiv: http://arxiv.org/abs/1504.07889
- bitbucket: https://bitbucket.org/tsungyu/bcnn.git
Fine-grained Image Classification by Exploring Bipartite-Graph Labels
- intro: CVPR 2016
- project page: http://www.f-zhou.com/fg.html
- arxiv: http://arxiv.org/abs/1512.02665
- demo: http://www.f-zhou.com/fg_demo/
Embedding Label Structures for Fine-Grained Feature Representation
- intro: CVPR 2016
- arxiv: http://arxiv.org/abs/1512.02895
- paper: http://webpages.uncc.edu/~szhang16/paper/CVPR16_structured_labels.pdf
Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop
Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition
Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition
Learning Deep Representations of Fine-grained Visual Descriptions
- intro: CVPR 2016
- arxiv: http://arxiv.org/abs/1605.05395
- github: https://github.com/reedscot/cvpr2016
IDNet: Smartphone-based Gait Recognition with Convolutional Neural Networks
Picking Deep Filter Responses for Fine-grained Image Recognition
- intro: CVPR 2016
SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-grained Recognition
- intro: CVPR 2016
Part-Stacked CNN for Fine-Grained Visual Categorization
- intro: CVPR 2016
Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches
- intro: BMVC 2016
- arxiv: https://arxiv.org/abs/1610.06756
Low-rank Bilinear Pooling for Fine-Grained Classification
- intro: CVPR 2017
- project page: http://www.ics.uci.edu/~skong2/lr_bilinear.html
- arxiv: https://arxiv.org/abs/1611.05109
- github: https://github.com/aimerykong/Low-Rank-Bilinear-Pooling
细粒度图像分析
- intro: by 吴建鑫, NJU. VALSE 2017 Annual Progress Review Series
- slides: http://mac.xmu.edu.cn/valse2017/ppt/APR/wjx_APR.pdf
Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition
- intro: CVPR 2017
- paper: http://openaccess.thecvf.com/content_cvpr_2017/papers/Fu_Look_Closer_to_CVPR_2017_paper.pdf
Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach
- intro: ICCV 2017
- arxiv: https://arxiv.org/abs/1709.02476
Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition
https://arxiv.org/abs/1709.05769
Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition
- introL ICCV 2017
- keywords: MA-CNN
- intro: University of Science and Technology of China & Microsoft Research & University of Rochester
- paper: http://openaccess.thecvf.com/content_ICCV_2017/papers/Zheng_Learning_Multi-Attention_Convolutional_ICCV_2017_paper.pdf
TransFG: A Transformer Architecture for Fine-grained Recognition
- intro: Johns Hopkins University & ByteDance Inc.
- arxiv: https://arxiv.org/abs/2103.07976
Food Recognition
DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment
Im2Calories: towards an automated mobile vision food diary
- intro: recognize the contents of your meal from a single image, then predict its nutritional contents, such as calories
- paper: http://www.cs.ubc.ca/~murphyk/Papers/im2calories_iccv15.pdf
Food Image Recognition by Using Convolutional Neural Networks (CNNs)
Wide-Slice Residual Networks for Food Recognition
Food Classification with Deep Learning in Keras / Tensorflow
- blog: http://blog.stratospark.com/deep-learning-applied-food-classification-deep-learning-keras.html
- github: https://github.com/stratospark/food-101-keras
ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition
https://arxiv.org/abs/1705.02743
Computer vision-based food calorie estimation: dataset, method, and experiment
https://arxiv.org/abs/1705.07632
Deep Learning-Based Food Calorie Estimation Method in Dietary Assessment
https://arxiv.org/abs/1706.04062
Food Ingredients Recognition through Multi-label Learning
https://arxiv.org/abs/1707.08816
FoodNet: Recognizing Foods Using Ensemble of Deep Networks
- intro: IEEE Signal Processing Letters
- arxiv: https://arxiv.org/abs/1709.09429
Food recognition and recipe analysis: integrating visual content, context and external knowledge
https://arxiv.org/abs/1801.07230
Attribute Recognition
Multi-task CNN Model for Attribute Prediction
- intro: ieee transaction paper
- arxiv: https://arxiv.org/abs/1601.00400
Attributes for Improved Attributes: A Multi-Task Network for Attribute Classification
https://arxiv.org/abs/1604.07360
Generative Adversarial Models for People Attribute Recognition in Surveillance
- intro: AVSS 2017 oral
- arxiv: https://arxiv.org/abs/1707.02240
Attribute Recognition by Joint Recurrent Learning of Context and Correlation
- intro: ICCV 2017
- arxiv: https://arxiv.org/abs/1709.08553
Multi-label Object Attribute Classification using a Convolutional Neural Network
https://arxiv.org/abs/1811.04309
Pedestrian Attribute Recognition / Person Attribute Recognition
Multi-attribute Learning for Pedestrian Attribute Recognition in Surveillance Scenarios
- intro: ACPR 2015
- keywords: DeepSAR / DeepMAR
- paper: http://or.nsfc.gov.cn/bitstream/00001903-5/417802/1/1000014103914.pdf
- github: https://github.com/kyu-sz/DeepMAR_deploy
- github: https://github.com/dangweili/pedestrian-attribute-recognition-pytorch
Pedestrian Attribute Recognition At Far Distance
- intro: ACM MM 2014
- paper: http://personal.ie.cuhk.edu.hk/~pluo/pdf/mm14.pdf
Person Attribute Recognition with a Jointly-trained Holistic CNN Model
- intro: ICCV 2015
- keywords: Parse27k
- arxiv: https://www.vision.rwth-aachen.de/media/papers/sudowe_spitzer_leibe_ICCV_LaP_2015.pdf
Human Attribute Recognition by Deep Hierarchical Contexts
- intro: ECCV 2016
- paper: http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2016_human.pdf
Robust Pedestrian Attribute Recognition for an Unbalanced Dataset using Mini-batch Training with Rarity Rate
- intro: Intelligent Vehicles Symposium 2016
- intro: Chubu University & Nagoya University, Japan
- paper: http://www.vision.cs.chubu.ac.jp/MPRG/C_group/C081_fukui2016.pdf
Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization
Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model
- intro: BMVC 2017
- keywords: PETA, RAP and WIDER
- arxiv: https://arxiv.org/abs/1707.06089
- github: https://github.com/asc-kit/vespa
HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis
- intro: ICCV 2017
- intro: CUHK & SenseTime
- keywords: multi-directional attention (MDA)
- arxiv: https://arxiv.org/abs/1709.09930
- paper: http://openaccess.thecvf.com/content_ICCV_2017/papers/Liu_HydraPlus-Net_Attentive_Deep_ICCV_2017_paper.pdf
- github: https://github.com/xh-liu/HydraPlus-Net
Deep Imbalanced Attribute Classification using Visual Attention Aggregation
- intro: ECCV 2018
- intro: University of Houston
- arxiv: https://arxiv.org/abs/1807.03903
Localization Guided Learning for Pedestrian Attribute Recognition
- intro: BMVC 2018
- arxiv: https://arxiv.org/abs/1808.09102
Grouping Attribute Recognition for Pedestrian with Joint Recurrent Learning
- intro: IJCAI 2018
- paper: https://www.ijcai.org/proceedings/2018/0441.pdf
Sequence-based Person Attribute Recognition with Joint CTC-Attention Model
- keywords: joint CTC-Attention model (JCM), s connectionist temporal classification (CTC)
- arxiv: https://arxiv.org/abs/1811.08115
The Deeper, the Better: Analysis of Person Attributes Recognition
https://arxiv.org/abs/1901.03756
Video-Based Pedestrian Attribute Recognition
https://arxiv.org/abs/1901.05742
Pedestrian Attribute Recognition: A Survey
- intro: Anhui University
- project page: https://sites.google.com/view/ahu-pedestrianattributes/
- arxiv: https://arxiv.org/abs/1901.07474
Papers with code: Pedestrian Attribute Recognition
https://paperswithcode.com/task/pedestrian-attribute-recognition/codeless
Pedestrian-Attribute-Recognition-Paper-List
https://github.com/wangxiao5791509/Pedestrian-Attribute-Recognition-Paper-List
Attribute Aware Pooling for Pedestrian Attribute Recognition
- intro: IJCAI 2019
- intro: Huawei Noah’s Ark Lab & University of Sydney
- arxiv: https://arxiv.org/abs/1907.11837
Distraction-Aware Feature Learning for Human Attribute Recognition via Coarse-to-Fine Attention Mechanism
- intro: AAAI 2020 oral
- arxiv: https://arxiv.org/abs/1911.11351
Rethinking of Pedestrian Attribute Recognition: Realistic Datasets with Efficient Method
- intro: University of Chinese Academy of Sciences & Chinese Academy of Sciences
- arxiv: https://arxiv.org/abs/2005.11909
- github(official, Pytorch): https://github.com/valencebond/Strong_Baseline_of_Pedestrian_Attribute_Recognition
Hierarchical Feature Embedding for Attribute Recognition
- intro: SenseTime Group Limited & Tsinghua University
- arxiv: https://arxiv.org/abs/2005.11576
Clothes Recognition
DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations
- intro: CVPR 2016
- keywords: FashionNet
- project page: http://personal.ie.cuhk.edu.hk/~lz013/projects/DeepFashion.html
- paper: http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Liu_DeepFashion_Powering_Robust_CVPR_2016_paper.pdf
Multi-Task Curriculum Transfer Deep Learning of Clothing Attributes
Star-galaxy Classification
Star-galaxy Classification Using Deep Convolutional Neural Networks
- intro: MNRAS
- arxiv: http://arxiv.org/abs/1608.04369
- github: https://github.com/EdwardJKim/dl4astro
Logo Recognition
Deep Learning for Logo Recognition
Plant Classification
Large-Scale Plant Classification with Deep Neural Networks
- intro: Published at Proocedings of ACM Computing Frontiers Conference 2017
- arxiv: https://arxiv.org/abs/1706.03736
Scene Recognition / Scene Classification
Learning Deep Features for Scene Recognition using Places Database
- paper: http://places.csail.mit.edu/places_NIPS14.pdf
- gihtub: https://github.com/metalbubble/places365
Using neon for Scene Recognition: Mini-Places2
- intro: This is an implementation of the deep residual network used for Mini-Places2 as described in He et. al., “Deep Residual Learning for Image Recognition”.
- blog: http://www.nervanasys.com/using-neon-for-scene-recognition-mini-places2/
- github: https://github.com/hunterlang/mpmz
Scene Classification with Inception-7
Semantic Clustering for Robust Fine-Grained Scene Recognition
Scene recognition with CNNs: objects, scales and dataset bias
- intro: CVPR 2016
- arxiv: https://arxiv.org/abs/1801.06867
Leaderboard
Leaderboard of Places Database
- intro: currently rank1: Qian Zhang(Beijing Samsung Telecom R&D Center), 0.6410@top1, 0.9065@top5
- homepage: http://places.csail.mit.edu/user/leaderboard.php
Blogs
What is the class of this image ? - Discover the current state of the art in objects classification
- intro: “Discover the current state of the art in objects classification.”
- intro: MNIST, CIFAR-10, CIFAR-100, STL-10, SVHN, ILSVRC2012 task 1
- blog: http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html
Object Recognition with Convolutional Neural Networks in the Keras Deep Learning Library
The Effect of Resolution on Deep Neural Network Image Classification Accuracy