Computer Vision Datasets

Published: 24 Sep 2015 Category: computer_vision

Datasets who is the best at X ?

Computer Vision Datasets

Introducing the Open Images Dataset

A parallel download util for Google’s open image dataset

Image & Vision Group - Datasets

Huizhong Chen - Datasets

Classification / Recognition

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification


  • intro: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
  • homepage:

Tencent ML-Images


The MegaFace Benchmark: 1 Million Faces for Recognition at Scale

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

MSR Image Recognition Challenge (IRC)

UMDFaces: An Annotated Face Dataset for Training Deep Networks


The Comprehensive Cars (CompCars) dataset

BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance [IEEE T-ITS]

Vehicle Make and Model Recognition Dataset (VMMRdb)

  • intro: containing 9,170 classes consisting of 291,752 images, covering models manufactured between 1950 to 2016
  • homepage:

Cars Dataset

Scene Recognition

Places: An Image Database for Deep Scene Understanding


The Places365-CNNs for Scene Classification


EMNIST: an extension of MNIST to handwritten letters



3 Million Instacart Orders, Open Sourced


YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects

Exclusively Dark (ExDark) Image Dataset

  • intro: Exclusively Dark (ExDARK) dataset which to the best of our knowledge, is the largest collection of low-light images taken in very low-light environments to twilight (i.e 10 different conditions) to-date with image class and object level annotations.
  • github:

Face Detection

FDDB: Face Detection Data Set and Benchmark

WIDER FACE: A Face Detection Benchmark

Pedestrian Detection

Caltech Pedestrian Detection Benchmark

Caltech Pedestrian Dataset Converter

CityPersons: A Diverse Dataset for Pedestrian Detection

CrowdHuman: A Benchmark for Detecting Human in a Crowd

  • intro: CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset
  • homepage:

EuroCity Persons Dataset

  • intro: collected on-board a moving vehicle in 31 cities of 12 European countries, over 238200 person instances manually labeled in over 47300 images, contains a large number of person orientation annotations (over 211200)
  • arxiv:

Vehicle Detection

Toyota Motor Europe (TME) Motorway Dataset

Welcome to BIT-Vehicle Dataset

Salieny Detection

MSRA10K Salient Object Database

Logo Detection

QMUL-OpenLogo: Open Logo Detection Challenge

  • intro: QMUL-OpenLogo contains 27,083 images from 352 logo classes, built by aggregating and refining 7 existing datasets and establishing an open logo detection evaluation protocol
  • homepage:

Head Detection


HollywoodHeads dataset

Brainwash dataset.

Detection From Video

YouTube-Objects dataset v2.2

ILSVRC2015: Object detection from video (VID)


Mapillary Vistas Dataset

Mapillary Vistas Dataset

Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See

Multi-Human Parsing


Augmented Pascal VOC

Supervisely Person

Microsoft COCO

The Oxford-IIIT Pet Dataset

  • intro: a 37 category pet dataset with roughly 200 images for each class. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
  • homepage:


COCO-Stuff: Thing and Stuff Classes in Context

COCO-Stuff 10K dataset v1.1

Scene Parsing

MIT Scene Parsing Benchmark


  • intro: train: 20,120 images, val: 2000 images. contains 150 stuff/object category labels (e.g., wall, sky, and tree) and 1,038 imagelevel scene descriptors (e.g., airport terminal, bedroom, and street).
  • homepage:

Semantic Understanding of Scenes through the ADE20K Dataset



Captioning / Description

TGIF: A New Dataset and Benchmark on Animated GIF Description

Collecting Multilingual Parallel Video Descriptions Using Mechanical Turk


Dataset # Videos # Classes Year Manually Labeled ?
Kodak 1,358 25 2007
HMDB51 7000 51    
Charades 9848 157    
MCG-WEBV 234,414 15 2009
CCV 9,317 20 2011
UCF-101 13,320 101 2012
THUMOS-2 18,394 101 2014
MED-2014 ≈28,000 20 2014
Sports-1M 1M 487 2014
ActivityNet 27,801 203 2015
FCVID 91,223 239 2015

UCF101 - Action Recognition Data Set

HMDB51: A Large Video Database for Human Motion Recognition

ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding


Charades Dataset

  • intro: This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.
  • intro: The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos.
  • homepage:

FCVID: Fudan-Columbia Video Dataset

YouTube-8M: A Large-Scale Video Classification Benchmark

stabilized video frames

The Kinetics Human Action Video Dataset

e-Lab Video Data Set(s)

  • intro: “Currently, e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each (see histogram below). We are aiming to collect overall 1750 (50 × 35) videos with your help.”
  • homepage:

Video Dataset Overview


SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

Autonomous Driving

BDD: Berkely Deep Drive


COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

Chinese Text in the Wild

ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views








DeepFashion: In-shop Clothes Retrieval

Person Re-ID

Dataset Description
CUHK01 971 identities, 3884 images, manually cropped
CUHK02 1816 identities, 7264 images, manually cropped
CUHK03 1360 identities, 13164 images, manually cropped + automatically detected

Person Re-identification Datasets

CUHK Person Re-identification Datasets

PRW (Person Re-identification in the Wild) Dataset

Person Re-identification in the Wild


  • intro: DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset
  • intro: 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images
  • github:


Person Re-ID (PRID) Dataset 2011

MARS (Motion Analysis and Re-identification Set) Dataset

X-MARS Reordering of the MARS Dataset for Image to Video Evaluation


Labeled Pedestrian in the Wild



iQIYI-VID: A Large Dataset for Multi-modal Person Identification


Large-scale Fashion (DeepFashion) Database

Apparel classification with Style

Attribute Datasets

Attribute Datasets

Pedestrian Attribute Recognition

A Richly Annotated Dataset for Pedestrian Attribute Recognition

Pedestrian Attribute Recognition At Far Distance





UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking

DukeMTMC: Duke Multi-Target, Multi-Camera Tracking Project

  • intro: DukeMTMC aims to accelerate advances in multi-target multi-camera tracking. It provides a tracking system that works within and across cameras, a new large scale HD video data set recorded by 8 synchronized cameras with more than 7,000 single camera trajectories and over 2,000 unique identities
  • homepage:

The WILDTRACK Seven-Camera HD Dataset

GOT-10k: Generic Object Tracking Benchmark

Color Classification

Vehicle Color Recognition on an Urban Road by Feature Context

License Plate Detection and Recognition

Application-Oriented License Plate (AVOP) Database

CCPD: Chinese City Parking Dataset


VoTT: Visual Object Tagging Tool 1.5

  • intro: Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos
  • github:

LabelImg: a graphical image annotation tool and label object bounding boxes in images

Pychet Labeller

ml-pyxis: Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).

  • intro: Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
  • github:

Open Image Dataset downloader


Data Labeler for Video

Computer Vision Annotation Tool (CVAT)

  • intro: Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms
  • github:


BAM! The Behance Artistic Media Dataset


CV Datasets on the web

Awesome Public Datasets

Machine Learning Repository