Computer Vision Datasets

Published: 24 Sep 2015 Category: computer_vision

Datasets who is the best at X ?

blog: http://rodrigob.github.io/are_we_there_yet/build/#datasets

Computer Vision Datasets

website: http://clickdamage.com/sourcecode/index.html
code: http://clickdamage.com/sourcecode/cv_datasets.php
mirror: http://pan.baidu.com/s/1pJmqD4n

Introducing the Open Images Dataset

blog: https://research.googleblog.com/2016/09/introducing-open-images-dataset.html
github: https://github.com/openimages/dataset
Academic Torrents: http://academictorrents.com/details/9e9194e21ce045deee8d811481b4cd676b20b06b

A parallel download util for Google’s open image dataset

github: https://github.com/ejlb/google-open-image-download

Image & Vision Group - Datasets

intro: Image & Vision , Clothing & Fashion, Computer Graphics, Video Sequences
homepage: http://caiivg.weebly.com/dataset.html

Huizhong Chen - Datasets

intro: Google I/O Dataset, Names 100 Dataset, Clothing Attributes Dataset, Stanford Mobile Visual Search Dataset, CNN 2-Hours Videos Dataset
homepage: http://huizhongchen.github.io/datasets.html#clothingattributedataset

Classification / Recognition

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification

project page: http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html
arxiv: http://arxiv.org/abs/1506.08959

CIFAR-10 / CIFAR100

intro: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
homepage: http://www.cs.toronto.edu/~kriz/cifar.html

Tencent ML-Images

intro: Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet
github: https://github.com/Tencent/tencent-ml-images

Face

The MegaFace Benchmark: 1 Million Faces for Recognition at Scale

homepage: http://megaface.cs.washington.edu/
arxiv: http://arxiv.org/abs/1512.00596

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

arxiv: http://arxiv.org/abs/1607.08221

MSR Image Recognition Challenge (IRC)

homepage: https://www.microsoft.com/en-us/research/project/msr-image-recognition-challenge-irc/

UMDFaces: An Annotated Face Dataset for Training Deep Networks

arxiv: https://arxiv.org/abs/1611.01484

Vehicle

The Comprehensive Cars (CompCars) dataset

http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/

BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance [IEEE T-ITS]

https://medusa.fit.vutbr.cz/traffic/research-topics/fine-grained-vehicle-recognition/boxcars-improving-vehicle-fine-grained-recognition-using-3d-bounding-boxes-in-traffic-surveillance/

Vehicle Make and Model Recognition Dataset (VMMRdb)

intro: containing 9,170 classes consisting of 291,752 images, covering models manufactured between 1950 to 2016
homepage: http://vmmrdb.cecsresearch.org/

Cars Dataset

intro: contains 16,185 images of 196 classes of cars.
homepage: http://ai.stanford.edu/~jkrause/cars/car_dataset.html

Scene Recognition

Places: An Image Database for Deep Scene Understanding

project page: http://places.csail.mit.edu/index.html
arxiv: https://arxiv.org/abs/1610.02055

Places2

intro: Places2 contains more than 10 million images comprising 400+ unique scene categories
homepage: http://places2.csail.mit.edu/

The Places365-CNNs for Scene Classification

github: https://github.com/CSAILVision/places365

MNIST

EMNIST: an extension of MNIST to handwritten letters

arxiv: https://arxiv.org/abs/1702.05373

Fashion-MNIST

arxiv: https://arxiv.org/abs/1708.07747
github: https://github.com/zalandoresearch/fashion-mnist
benchmark: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/

Food

3 Million Instacart Orders, Open Sourced

https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2

Detection

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

intro: YouTube-BoundingBoxes (YT-BB)
homepage: https://research.google.com/youtubebb/
arxiv: https://arxiv.org/abs/1702.00824

DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects

https://arxiv.org/abs/1804.00525

Exclusively Dark (ExDark) Image Dataset

intro: Exclusively Dark (ExDARK) dataset which to the best of our knowledge, is the largest collection of low-light images taken in very low-light environments to twilight (i.e 10 different conditions) to-date with image class and object level annotations.
github: https://github.com/cs-chan/Exclusively-Dark-Image-Dataset

Face Detection

FDDB: Face Detection Data Set and Benchmark

homepage: http://vis-www.cs.umass.edu/fddb/index.html
results: http://vis-www.cs.umass.edu/fddb/results.html

WIDER FACE: A Face Detection Benchmark

homepage: http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/
arxiv: http://arxiv.org/abs/1511.06523

Pedestrian Detection

Caltech Pedestrian Detection Benchmark

homepage: http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/

Caltech Pedestrian Dataset Converter

https://github.com/mitmul/caltech-pedestrian-dataset-converter

CityPersons: A Diverse Dataset for Pedestrian Detection

arxiv: https://arxiv.org/abs/1702.05693
bitbucket: https://bitbucket.org/shanshanzhang/citypersons
supplemental: http://openaccess.thecvf.com/content_cvpr_2017/supplemental/Zhang_CityPersons_A_Diverse_2017_CVPR_supplemental.pdf

CrowdHuman: A Benchmark for Detecting Human in a Crowd

intro: CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset
homepage: https://sshao0516.github.io/CrowdHuman/

EuroCity Persons Dataset

intro: collected on-board a moving vehicle in 31 cities of 12 European countries, over 238200 person instances manually labeled in over 47300 images, contains a large number of person orientation annotations (over 211200)
homepage: https://eurocity-dataset.tudelft.nl/
arxiv: https://arxiv.org/abs/1805.07193

WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild

project page: http://www.cbsr.ia.ac.cn/users/sfzhang/WiderPerson/

Full-Body Annotations

COCO-WholeBody

https://github.com/jin-s13/COCO-WholeBody

Halpe Full-Body Human Keypoints and HOI-Det dataset

intro: Halpe: full body human pose estimation and human-object interaction detection dataset
github:https://github.com/Fang-Haoshu/Halpe-FullBody

Vehicle Detection

Toyota Motor Europe (TME) Motorway Dataset

intro: composed by 28 clips for a total of approximately 27 minutes (30000+ frames) with vehicle annotation
homepage: http://cmp.felk.cvut.cz/data/motorway/

Welcome to BIT-Vehicle Dataset

intro: 9,850 vehicle images, sizes of 16001200 and 19201080 captured from two cameras at different time and places in the dataset
homepage: http://iitlab.bit.edu.cn/mcislab/vehicledb/

Vehicle Re-ID

A Large-Scale Dataset for Vehicle Re-Identification in the Wild

github: https://github.com/PKU-IMRE/VERI-Wild

Logo Detection

QMUL-OpenLogo: Open Logo Detection Challenge

intro: QMUL-OpenLogo contains 27,083 images from 352 logo classes, built by aggregating and refining 7 existing datasets and establishing an open logo detection evaluation protocol
homepage: https://qmul-openlogo.github.io/

Head Detection

SCUT-HEAD

intro: SCUT HEAD is a large-scale head detection dataset, including 4405 images labeld with 111251 heads.
github: https://github.com/HCIILAB/SCUT-HEAD-Dataset-Release

HollywoodHeads dataset

http://www.di.ens.fr/willow/research/headdetection/

Brainwash dataset.

https://exhibits.stanford.edu/data/catalog/sx925dc9385

Detection From Video

YouTube-Objects dataset v2.2

homepage: http://calvin.inf.ed.ac.uk/datasets/youtube-objects-dataset/

ILSVRC2015: Object detection from video (VID)

homepage: http://vision.cs.unc.edu/ilsvrc2015/download-videos-3j16.php#vid

Segmentation

Mapillary Vistas Dataset

Mapillary Vistas Dataset

intro: 25,000 high-resolution images, 100 object categories, 60 of those instance-specific https://www.mapillary.com/dataset/

Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See

http://blog.mapillary.com/product/2017/05/03/mapillary-vistas-dataset.html

Multi-Human Parsing

https://lv-mhp.github.io/

PASCAL VOC

Augmented Pascal VOC

http://home.bharathh.info/pubs/codes/SBD/download.html

Supervisely Person

Microsoft COCO

homepage: http://mscoco.org/
github: https://github.com/pdollar/coco

The Oxford-IIIT Pet Dataset

intro: a 37 category pet dataset with roughly 200 images for each class. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
homepage: http://www.robots.ox.ac.uk/~vgg/data/pets/

COCO-Stuff

COCO-Stuff: Thing and Stuff Classes in Context

COCO-Stuff 10K dataset v1.1

https://arxiv.org/abs/1612.03716 https://github.com/nightrome/cocostuff

Scene Parsing

MIT Scene Parsing Benchmark

http://sceneparsing.csail.mit.edu/

ADE20K

intro: train: 20,120 images, val: 2000 images. contains 150 stuff/object category labels (e.g., wall, sky, and tree) and 1,038 imagelevel scene descriptors (e.g., airport terminal, bedroom, and street).
homepage: http://groups.csail.mit.edu/vision/datasets/ADE20K/

Semantic Understanding of Scenes through the ADE20K Dataset

https://arxiv.org/abs/1608.05442

ImageNet

synsets: http://image-net.org/challenges/LSVRC/2014/browse-det-synsets

ImageNet-Utils

intro: Utils to help download images by id, crop bounding box, label images, etc.
github: https://github.com/tzutalin/ImageNet_Utils

Captioning / Description

TGIF: A New Dataset and Benchmark on Animated GIF Description

arxiv: http://arxiv.org/abs/1604.02748
github: https://github.com/raingo/TGIF-Release

Collecting Multilingual Parallel Video Descriptions Using Mechanical Turk

intro: 1970 YouTube video snippets: 1200 training, 100 validation, 670 test
homepage: http://www.cs.utexas.edu/users/ml/clamp/videoDescription/

Video

Dataset	# Videos	# Classes	Year	Manually Labeled ?
Kodak	1,358	25	2007	✓
HMDB51	7000	51
Charades	9848	157
MCG-WEBV	234,414	15	2009	✓
CCV	9,317	20	2011	✓
UCF-101	13,320	101	2012	✓
THUMOS-2	18,394	101	2014	✓
MED-2014	≈28,000	20	2014	✓
Sports-1M	1M	487	2014	✗
ActivityNet	27,801	203	2015	✓
FCVID	91,223	239	2015	✓

UCF101 - Action Recognition Data Set

homepage: http://crcv.ucf.edu/data/UCF101.php

HMDB51: A Large Video Database for Human Motion Recognition

homepage: http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/

ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding

homepage: http://activity-net.org/
download: http://activity-net.org/download.html
github: https://github.com/activitynet

Sports-1M

homepage: https://github.com/gtoderici/sports-1m-dataset/blob/wiki/ProjectHome.md
github: https://github.com/gtoderici/sports-1m-dataset/
thumbnails: http://cs.stanford.edu/people/karpathy/deepvideo/classes.html

Charades Dataset

intro: This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.
intro: The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos.
homepage: http://allenai.org/plato/charades/

FCVID: Fudan-Columbia Video Dataset

homepage: http://bigvid.fudan.edu.cn/FCVID/

YouTube-8M: A Large-Scale Video Classification Benchmark

homepage: http://research.google.com/youtube8m/
arxiv: http://arxiv.org/abs/1609.08675

stabilized video frames

intro: 9 TB, 35,000,000 clips, 32 frames
intro: Generating Videos with Scene Dynamics
homepage: http://web.mit.edu/vondrick/tinyvideo/#data

The Kinetics Human Action Video Dataset

intro: Google
homepage: https://deepmind.com/research/open-source/open-source-datasets/kinetics/
arxiv: https://arxiv.org/abs/1705.06950

e-Lab Video Data Set(s)

intro: “Currently, e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each (see histogram below). We are aiming to collect overall 1750 (50 × 35) videos with your help.”
homepage: https://engineering.purdue.edu/elab/eVDS

Video Dataset Overview

intro: Sortable and searchable compilation of video dataset
arxiv: https://www.di.ens.fr/~miech/datasetviz/

Scene

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

intro: Imperial College London
project page: https://robotvault.bitbucket.org/scenenet-rgbd.html
github: https://arxiv.org/abs/1612.05079
github: https://github.com/jmccormac/pySceneNetRGBD

Autonomous Driving

BDD: Berkely Deep Drive

intro: 100,000 HD video sequences of over 1,100-hour driving experience across many different times in the day, weather conditions, and driving scenarios
homepage: http://bdd-data.berkeley.edu/
github: https://github.com/ucbdrive/bdd-data

OCR

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

homepage: http://vision.cornell.edu/se3/coco-text/
arxiv: http://arxiv.org/abs/1601.07140

Chinese Text in the Wild

intro: 32,285 high resolution images, 1,018,402 character instances, 3,850 character categories, 6 kinds of attributes
homepage: https://ctwdataset.github.io/
arxiv: https://arxiv.org/abs/1803.00085

ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views

arxiv: https://arxiv.org/abs/1903.10412
github: https://github.com/chongshengzhang/shopsign

Retrieval

Oxford5k

Paris6k

Oxford105k

UKB

NUS-WIDE

ImageNet-YahooQA

University-1652:

[Paper] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍]

Dataset and Baseline Code: https://github.com/layumi/University1652-Baseline

DeepFashion: In-shop Clothes Retrieval

intro: 7,982 number of clothing items; 52,712 number of in-shop clothes images, and ~200,000 cross-pose/scale pairs; Each image is annotated by bounding box, clothing type and pose type.
homepage: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/InShopRetrieval.html

Person Re-ID

Dataset	Description
CUHK01	971 identities, 3884 images, manually cropped
CUHK02	1816 identities, 7264 images, manually cropped
CUHK03	1360 identities, 13164 images, manually cropped + automatically detected

Person Re-identification Datasets

CUHK Person Re-identification Datasets

http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html

PRW (Person Re-identification in the Wild) Dataset

homepage: http://www.liangzheng.com.cn/Project/project_prw.html
github: https://github.com/liangzheng06/PRW-baseline

Person Re-identification in the Wild

intro: CVPR 2017 spotlight
arxiv: https://arxiv.org/abs/1604.02531

DukeMTMC-reID

intro: DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset
intro: 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images
github: https://github.com/layumi/DukeMTMC-reID_evaluation

DukeMTMC4ReID

intro: DukeMTMC4ReID dataset
github: https://github.com/NEU-Gou/DukeReID

Person Re-ID (PRID) Dataset 2011

https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/

MARS (Motion Analysis and Re-identification Set) Dataset

intro: an extension of the Market-1501 dataset
homepage: http://www.liangzheng.com.cn/Project/project_mars.html
github: https://github.com/liangzheng06/MARS-evaluation

X-MARS Reordering of the MARS Dataset for Image to Video Evaluation

intro: This repository provides the X-MARS dataset splits for image to video/tracklet evaluation
github: https://github.com/andreas-eberle/x-mars

MSMT17

intro: 15-camera (12 outdoor cameras, 3 indoor cameras), 4,101 Identities, 126,441 BBoxes
homepage: http://www.pkuvmc.com/publications/longhui.html
soa: http://www.pkuvmc.com/publications/state_of_the_art.html

Labeled Pedestrian in the Wild

intro: train/test identities: 1,975/756
homepage: http://liuyu.us/dataset/lpw/

SenseReID

https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view

3DPeS

http://www.openvisor.org/3dpes.asp

iQIYI-VID: A Large Dataset for Multi-modal Person Identification

https://arxiv.org/abs/1811.07548

Fashion

Large-scale Fashion (DeepFashion) Database

intro: Attribute Prediction, Consumer-to-shop Clothes Retrieval, In-shop Clothes Retrieval, and Landmark Detection
homepage: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html

Apparel classification with Style

intro: 15 clothing classes, 88951 images
homepage: http://people.ee.ethz.ch/~lbossard/projects/accv12/index.html

Attribute Datasets

Attribute Datasets

intro: in total 41,585 pedestrian samples, each of which is annotated with 72 attributes as well as viewpoints, occlusions, body parts information
homepage: https://www.ecse.rpi.edu/homepages/cvrl/database/AttributeDataset.htm

Pedestrian Attribute Recognition

A Richly Annotated Dataset for Pedestrian Attribute Recognition

homepage: http://rap.idealtest.org/
arxiv: https://arxiv.org/abs/1603.07054

Pedestrian Attribute Recognition At Far Distance

intro: PEdesTrian Attribute (PETA)
homepage: http://mmlab.ie.cuhk.edu.hk/projects/PETA.html
paper: http://personal.ie.cuhk.edu.hk/~pluo/pdf/mm14.pdf

Market-1501_Attribute

github: https://github.com/vana77/Market-1501_Attribute
blog: https://vana77.github.io

DukeMTMC-attribute

github: https://github.com/vana77/DukeMTMC-attribute
blog: https://vana77.github.io

Parse27k

intro: Pedestrian Attribute Recognition in Sequences
intro: >27,000 annotated pedestrians, 10 attributes
homepage: https://www.vision.rwth-aachen.de/page/parse27k
tools: https://github.com/psudowe/parse27k_tools

Tracking

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking

homepage: http://detrac-db.rit.albany.edu/
arxiv: https://arxiv.org/abs/1511.04136

DukeMTMC: Duke Multi-Target, Multi-Camera Tracking Project

intro: DukeMTMC aims to accelerate advances in multi-target multi-camera tracking. It provides a tracking system that works within and across cameras, a new large scale HD video data set recorded by 8 synchronized cameras with more than 7,000 single camera trajectories and over 2,000 unique identities
homepage: http://vision.cs.duke.edu/DukeMTMC/

The WILDTRACK Seven-Camera HD Dataset

https://cvlab.epfl.ch/data/wildtrack

GOT-10k: Generic Object Tracking Benchmark

intro: A large, high-diversity, one-shot database for generic object tracking in the wild
project page: http://got-10k.aitestunion.com/
github: https://github.com/got-10k/toolkit

Color Classification

Vehicle Color Recognition on an Urban Road by Feature Context

http://mclab.eic.hust.edu.cn/~pchen/project.html

License Plate Detection and Recognition

Application-Oriented License Plate (AVOP) Database

http://aolpr.ntust.edu.tw/lab/download.html

CCPD: Chinese City Parking Dataset

Face Anti-Spoofing

CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations

intro: ECCV 2020
arxiv: https://arxiv.org/abs/2007.12342
github: https://github.com/Davidzhangyuanhan/CelebA-Spoof

Tools

VoTT: Visual Object Tagging Tool 1.5

intro: Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos
github: https://github.com/Microsoft/VoTT

LabelImg: a graphical image annotation tool and label object bounding boxes in images

github: https://github.com/tzutalin/labelImg

Pychet Labeller

intro: A python based annotation/labelling toolbox for images. The program allows the user to annotate individual objects in images.
github: https://github.com/sbargoti/pychetlabeller

ml-pyxis: Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).

intro: Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
github: https://github.com/vicolab/ml-pyxis

Open Image Dataset downloader

github: https://github.com/e-lab/crawl-dataset

BBox-Label-Tool

intro: A simple tool for labeling object bounding boxes in images
github: https://github.com/puzzledqs/BBox-Label-Tool

Data Labeler for Video

intro: A GUI tool for conveniently label the objects in video, using the powerful object tracking.
github: https://github.com//hahnyuan/video_labeler

Computer Vision Annotation Tool (CVAT)

intro: Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms
github: https://github.com/opencv/cvat

Artist

BAM! The Behance Artistic Media Dataset

intro: 2.5M artwork urls, 393K attribute labels, 74K short image descriptions/captions
project page: https://bam-dataset.org/
arxiv: https://arxiv.org/abs/1704.08614

Resources

CV Datasets on the web

http://www.cvpapers.com/datasets.html

Awesome Public Datasets

intro: An awesome list of high-quality open datasets in public domains (on-going). By everyone, for everyone!
github: https://github.com/caesar0301/awesome-public-datasets

Machine Learning Repository

https://archive.ics.uci.edu/ml/datasets.html