Computer Vision Datasets

Published: 24 Sep 2015 Category: computer_vision

Datasets who is the best at X ?

Computer Vision Datasets

Introducing the Open Images Dataset

A parallel download util for Google’s open image dataset

Image & Vision Group - Datasets

Huizhong Chen - Datasets

Classification / Recognition

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification

CIFAR-10 / CIFAR100

  • intro: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
  • homepage: http://www.cs.toronto.edu/~kriz/cifar.html

Tencent ML-Images

Face

The MegaFace Benchmark: 1 Million Faces for Recognition at Scale

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

MSR Image Recognition Challenge (IRC)

UMDFaces: An Annotated Face Dataset for Training Deep Networks

Vehicle

The Comprehensive Cars (CompCars) dataset

http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/

BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance [IEEE T-ITS]

https://medusa.fit.vutbr.cz/traffic/research-topics/fine-grained-vehicle-recognition/boxcars-improving-vehicle-fine-grained-recognition-using-3d-bounding-boxes-in-traffic-surveillance/

Vehicle Make and Model Recognition Dataset (VMMRdb)

  • intro: containing 9,170 classes consisting of 291,752 images, covering models manufactured between 1950 to 2016
  • homepage: http://vmmrdb.cecsresearch.org/

Cars Dataset

Scene Recognition

Places: An Image Database for Deep Scene Understanding

Places2

The Places365-CNNs for Scene Classification

MNIST

EMNIST: an extension of MNIST to handwritten letters

Fashion-MNIST

Food

3 Million Instacart Orders, Open Sourced

https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2

Detection

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects

https://arxiv.org/abs/1804.00525

Exclusively Dark (ExDark) Image Dataset

  • intro: Exclusively Dark (ExDARK) dataset which to the best of our knowledge, is the largest collection of low-light images taken in very low-light environments to twilight (i.e 10 different conditions) to-date with image class and object level annotations.
  • github: https://github.com/cs-chan/Exclusively-Dark-Image-Dataset

Face Detection

FDDB: Face Detection Data Set and Benchmark

WIDER FACE: A Face Detection Benchmark

Pedestrian Detection

Caltech Pedestrian Detection Benchmark

Caltech Pedestrian Dataset Converter

https://github.com/mitmul/caltech-pedestrian-dataset-converter

CityPersons: A Diverse Dataset for Pedestrian Detection

CrowdHuman: A Benchmark for Detecting Human in a Crowd

  • intro: CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset
  • homepage: https://sshao0516.github.io/CrowdHuman/

EuroCity Persons Dataset

WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild

Full-Body Annotations

COCO-WholeBody

https://github.com/jin-s13/COCO-WholeBody

Halpe Full-Body Human Keypoints and HOI-Det dataset

Vehicle Detection

Toyota Motor Europe (TME) Motorway Dataset

Welcome to BIT-Vehicle Dataset

Vehicle Re-ID

A Large-Scale Dataset for Vehicle Re-Identification in the Wild

Logo Detection

QMUL-OpenLogo: Open Logo Detection Challenge

  • intro: QMUL-OpenLogo contains 27,083 images from 352 logo classes, built by aggregating and refining 7 existing datasets and establishing an open logo detection evaluation protocol
  • homepage: https://qmul-openlogo.github.io/

Head Detection

SCUT-HEAD

HollywoodHeads dataset

http://www.di.ens.fr/willow/research/headdetection/

Brainwash dataset.

https://exhibits.stanford.edu/data/catalog/sx925dc9385

Detection From Video

YouTube-Objects dataset v2.2

ILSVRC2015: Object detection from video (VID)

Segmentation

Mapillary Vistas Dataset

Mapillary Vistas Dataset

Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See

http://blog.mapillary.com/product/2017/05/03/mapillary-vistas-dataset.html

Multi-Human Parsing

https://lv-mhp.github.io/

PASCAL VOC

Augmented Pascal VOC

http://home.bharathh.info/pubs/codes/SBD/download.html

Supervisely Person

Microsoft COCO

The Oxford-IIIT Pet Dataset

  • intro: a 37 category pet dataset with roughly 200 images for each class. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
  • homepage: http://www.robots.ox.ac.uk/~vgg/data/pets/

COCO-Stuff

COCO-Stuff: Thing and Stuff Classes in Context

COCO-Stuff 10K dataset v1.1

https://arxiv.org/abs/1612.03716 https://github.com/nightrome/cocostuff

Scene Parsing

MIT Scene Parsing Benchmark

http://sceneparsing.csail.mit.edu/

ADE20K

  • intro: train: 20,120 images, val: 2000 images. contains 150 stuff/object category labels (e.g., wall, sky, and tree) and 1,038 imagelevel scene descriptors (e.g., airport terminal, bedroom, and street).
  • homepage: http://groups.csail.mit.edu/vision/datasets/ADE20K/

Semantic Understanding of Scenes through the ADE20K Dataset

https://arxiv.org/abs/1608.05442

ImageNet

ImageNet-Utils

Captioning / Description

TGIF: A New Dataset and Benchmark on Animated GIF Description

Collecting Multilingual Parallel Video Descriptions Using Mechanical Turk

Video

Dataset # Videos # Classes Year Manually Labeled ?
Kodak 1,358 25 2007
HMDB51 7000 51    
Charades 9848 157    
MCG-WEBV 234,414 15 2009
CCV 9,317 20 2011
UCF-101 13,320 101 2012
THUMOS-2 18,394 101 2014
MED-2014 ≈28,000 20 2014
Sports-1M 1M 487 2014
ActivityNet 27,801 203 2015
FCVID 91,223 239 2015

UCF101 - Action Recognition Data Set

HMDB51: A Large Video Database for Human Motion Recognition

ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding

Sports-1M

Charades Dataset

  • intro: This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.
  • intro: The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos.
  • homepage: http://allenai.org/plato/charades/

FCVID: Fudan-Columbia Video Dataset

YouTube-8M: A Large-Scale Video Classification Benchmark

stabilized video frames

The Kinetics Human Action Video Dataset

e-Lab Video Data Set(s)

  • intro: “Currently, e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each (see histogram below). We are aiming to collect overall 1750 (50 × 35) videos with your help.”
  • homepage: https://engineering.purdue.edu/elab/eVDS

Video Dataset Overview

Scene

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

Autonomous Driving

BDD: Berkely Deep Drive

OCR

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

Chinese Text in the Wild

ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views

Retrieval

Oxford5k

Paris6k

Oxford105k

UKB

NUS-WIDE

ImageNet-YahooQA

University-1652:

[Paper] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍]

  • Dataset and Baseline Code: https://github.com/layumi/University1652-Baseline

DeepFashion: In-shop Clothes Retrieval

Person Re-ID

Dataset Description
CUHK01 971 identities, 3884 images, manually cropped
CUHK02 1816 identities, 7264 images, manually cropped
CUHK03 1360 identities, 13164 images, manually cropped + automatically detected

Person Re-identification Datasets

CUHK Person Re-identification Datasets

http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html

PRW (Person Re-identification in the Wild) Dataset

Person Re-identification in the Wild

DukeMTMC-reID

  • intro: DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset
  • intro: 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images
  • github: https://github.com/layumi/DukeMTMC-reID_evaluation

DukeMTMC4ReID

Person Re-ID (PRID) Dataset 2011

https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/

MARS (Motion Analysis and Re-identification Set) Dataset

X-MARS Reordering of the MARS Dataset for Image to Video Evaluation

MSMT17

Labeled Pedestrian in the Wild

SenseReID

https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view

3DPeS

http://www.openvisor.org/3dpes.asp

iQIYI-VID: A Large Dataset for Multi-modal Person Identification

https://arxiv.org/abs/1811.07548

Fashion

Large-scale Fashion (DeepFashion) Database

Apparel classification with Style

Attribute Datasets

Attribute Datasets

Pedestrian Attribute Recognition

A Richly Annotated Dataset for Pedestrian Attribute Recognition

Pedestrian Attribute Recognition At Far Distance

Market-1501_Attribute

DukeMTMC-attribute

Parse27k

Tracking

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking

DukeMTMC: Duke Multi-Target, Multi-Camera Tracking Project

  • intro: DukeMTMC aims to accelerate advances in multi-target multi-camera tracking. It provides a tracking system that works within and across cameras, a new large scale HD video data set recorded by 8 synchronized cameras with more than 7,000 single camera trajectories and over 2,000 unique identities
  • homepage: http://vision.cs.duke.edu/DukeMTMC/

The WILDTRACK Seven-Camera HD Dataset

https://cvlab.epfl.ch/data/wildtrack

GOT-10k: Generic Object Tracking Benchmark

Color Classification

Vehicle Color Recognition on an Urban Road by Feature Context

http://mclab.eic.hust.edu.cn/~pchen/project.html

License Plate Detection and Recognition

Application-Oriented License Plate (AVOP) Database

http://aolpr.ntust.edu.tw/lab/download.html

CCPD: Chinese City Parking Dataset

Face Anti-Spoofing

CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations

Tools

VoTT: Visual Object Tagging Tool 1.5

  • intro: Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos
  • github: https://github.com/Microsoft/VoTT

LabelImg: a graphical image annotation tool and label object bounding boxes in images

Pychet Labeller

ml-pyxis: Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).

  • intro: Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
  • github: https://github.com/vicolab/ml-pyxis

Open Image Dataset downloader

BBox-Label-Tool

Data Labeler for Video

Computer Vision Annotation Tool (CVAT)

  • intro: Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms
  • github: https://github.com/opencv/cvat

Artist

BAM! The Behance Artistic Media Dataset

Resources

CV Datasets on the web

http://www.cvpapers.com/datasets.html

Awesome Public Datasets

Machine Learning Repository

https://archive.ics.uci.edu/ml/datasets.html