Computer Vision Datasets
Datasets who is the best at X ?
Computer Vision Datasets
- website: http://clickdamage.com/sourcecode/index.html
- code: http://clickdamage.com/sourcecode/cv_datasets.php
- mirror: http://pan.baidu.com/s/1pJmqD4n
Introducing the Open Images Dataset
- blog: https://research.googleblog.com/2016/09/introducing-open-images-dataset.html
- github: https://github.com/openimages/dataset
- Academic Torrents: http://academictorrents.com/details/9e9194e21ce045deee8d811481b4cd676b20b06b
A parallel download util for Google’s open image dataset
Image & Vision Group - Datasets
- intro: Image & Vision , Clothing & Fashion, Computer Graphics, Video Sequences
- homepage: http://caiivg.weebly.com/dataset.html
Huizhong Chen - Datasets
- intro: Google I/O Dataset, Names 100 Dataset, Clothing Attributes Dataset, Stanford Mobile Visual Search Dataset, CNN 2-Hours Videos Dataset
- homepage: http://huizhongchen.github.io/datasets.html#clothingattributedataset
Classification / Recognition
A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
- project page: http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html
- arxiv: http://arxiv.org/abs/1506.08959
CIFAR-10 / CIFAR100
- intro: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
- homepage: http://www.cs.toronto.edu/~kriz/cifar.html
Tencent ML-Images
- intro: Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet
- github: https://github.com/Tencent/tencent-ml-images
Face
The MegaFace Benchmark: 1 Million Faces for Recognition at Scale
- homepage: http://megaface.cs.washington.edu/
- arxiv: http://arxiv.org/abs/1512.00596
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
MSR Image Recognition Challenge (IRC)
UMDFaces: An Annotated Face Dataset for Training Deep Networks
Vehicle
The Comprehensive Cars (CompCars) dataset
http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/
BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance [IEEE T-ITS]
Vehicle Make and Model Recognition Dataset (VMMRdb)
- intro: containing 9,170 classes consisting of 291,752 images, covering models manufactured between 1950 to 2016
- homepage: http://vmmrdb.cecsresearch.org/
Cars Dataset
- intro: contains 16,185 images of 196 classes of cars.
- homepage: http://ai.stanford.edu/~jkrause/cars/car_dataset.html
Scene Recognition
Places: An Image Database for Deep Scene Understanding
- project page: http://places.csail.mit.edu/index.html
- arxiv: https://arxiv.org/abs/1610.02055
Places2
- intro: Places2 contains more than 10 million images comprising 400+ unique scene categories
- homepage: http://places2.csail.mit.edu/
The Places365-CNNs for Scene Classification
MNIST
EMNIST: an extension of MNIST to handwritten letters
Fashion-MNIST
- arxiv: https://arxiv.org/abs/1708.07747
- github: https://github.com/zalandoresearch/fashion-mnist
- benchmark: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/
Food
3 Million Instacart Orders, Open Sourced
https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2
Detection
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
- intro: YouTube-BoundingBoxes (YT-BB)
- homepage: https://research.google.com/youtubebb/
- arxiv: https://arxiv.org/abs/1702.00824
DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects
https://arxiv.org/abs/1804.00525
Exclusively Dark (ExDark) Image Dataset
- intro: Exclusively Dark (ExDARK) dataset which to the best of our knowledge, is the largest collection of low-light images taken in very low-light environments to twilight (i.e 10 different conditions) to-date with image class and object level annotations.
- github: https://github.com/cs-chan/Exclusively-Dark-Image-Dataset
Face Detection
FDDB: Face Detection Data Set and Benchmark
- homepage: http://vis-www.cs.umass.edu/fddb/index.html
- results: http://vis-www.cs.umass.edu/fddb/results.html
WIDER FACE: A Face Detection Benchmark
Pedestrian Detection
Caltech Pedestrian Detection Benchmark
Caltech Pedestrian Dataset Converter
https://github.com/mitmul/caltech-pedestrian-dataset-converter
CityPersons: A Diverse Dataset for Pedestrian Detection
- arxiv: https://arxiv.org/abs/1702.05693
- bitbucket: https://bitbucket.org/shanshanzhang/citypersons
- supplemental: http://openaccess.thecvf.com/content_cvpr_2017/supplemental/Zhang_CityPersons_A_Diverse_2017_CVPR_supplemental.pdf
CrowdHuman: A Benchmark for Detecting Human in a Crowd
- intro: CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset
- homepage: https://sshao0516.github.io/CrowdHuman/
EuroCity Persons Dataset
- intro: collected on-board a moving vehicle in 31 cities of 12 European countries, over 238200 person instances manually labeled in over 47300 images, contains a large number of person orientation annotations (over 211200)
- homepage: https://eurocity-dataset.tudelft.nl/
- arxiv: https://arxiv.org/abs/1805.07193
WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild
- project page: http://www.cbsr.ia.ac.cn/users/sfzhang/WiderPerson/
Full-Body Annotations
COCO-WholeBody
https://github.com/jin-s13/COCO-WholeBody
Halpe Full-Body Human Keypoints and HOI-Det dataset
- intro: Halpe: full body human pose estimation and human-object interaction detection dataset
- github:https://github.com/Fang-Haoshu/Halpe-FullBody
Vehicle Detection
Toyota Motor Europe (TME) Motorway Dataset
- intro: composed by 28 clips for a total of approximately 27 minutes (30000+ frames) with vehicle annotation
- homepage: http://cmp.felk.cvut.cz/data/motorway/
Welcome to BIT-Vehicle Dataset
- intro: 9,850 vehicle images, sizes of 16001200 and 19201080 captured from two cameras at different time and places in the dataset
- homepage: http://iitlab.bit.edu.cn/mcislab/vehicledb/
Vehicle Re-ID
A Large-Scale Dataset for Vehicle Re-Identification in the Wild
Logo Detection
QMUL-OpenLogo: Open Logo Detection Challenge
- intro: QMUL-OpenLogo contains 27,083 images from 352 logo classes, built by aggregating and refining 7 existing datasets and establishing an open logo detection evaluation protocol
- homepage: https://qmul-openlogo.github.io/
Head Detection
SCUT-HEAD
- intro: SCUT HEAD is a large-scale head detection dataset, including 4405 images labeld with 111251 heads.
- github: https://github.com/HCIILAB/SCUT-HEAD-Dataset-Release
HollywoodHeads dataset
http://www.di.ens.fr/willow/research/headdetection/
Brainwash dataset.
https://exhibits.stanford.edu/data/catalog/sx925dc9385
Detection From Video
YouTube-Objects dataset v2.2
ILSVRC2015: Object detection from video (VID)
Segmentation
Mapillary Vistas Dataset
Mapillary Vistas Dataset
- intro: 25,000 high-resolution images, 100 object categories, 60 of those instance-specific https://www.mapillary.com/dataset/
Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See
http://blog.mapillary.com/product/2017/05/03/mapillary-vistas-dataset.html
Multi-Human Parsing
PASCAL VOC
Augmented Pascal VOC
http://home.bharathh.info/pubs/codes/SBD/download.html
Supervisely Person
- homepage: https://supervise.ly/
- blog: https://hackernoon.com/releasing-supervisely-person-dataset-for-teaching-machines-to-segment-humans-1f1fc1f28469
Microsoft COCO
- homepage: http://mscoco.org/
- github: https://github.com/pdollar/coco
The Oxford-IIIT Pet Dataset
- intro: a 37 category pet dataset with roughly 200 images for each class. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
- homepage: http://www.robots.ox.ac.uk/~vgg/data/pets/
COCO-Stuff
COCO-Stuff: Thing and Stuff Classes in Context
COCO-Stuff 10K dataset v1.1
https://arxiv.org/abs/1612.03716 https://github.com/nightrome/cocostuff
Scene Parsing
MIT Scene Parsing Benchmark
http://sceneparsing.csail.mit.edu/
ADE20K
- intro: train: 20,120 images, val: 2000 images. contains 150 stuff/object category labels (e.g., wall, sky, and tree) and 1,038 imagelevel scene descriptors (e.g., airport terminal, bedroom, and street).
- homepage: http://groups.csail.mit.edu/vision/datasets/ADE20K/
Semantic Understanding of Scenes through the ADE20K Dataset
https://arxiv.org/abs/1608.05442
ImageNet
ImageNet-Utils
- intro: Utils to help download images by id, crop bounding box, label images, etc.
- github: https://github.com/tzutalin/ImageNet_Utils
Captioning / Description
TGIF: A New Dataset and Benchmark on Animated GIF Description
Collecting Multilingual Parallel Video Descriptions Using Mechanical Turk
- intro: 1970 YouTube video snippets: 1200 training, 100 validation, 670 test
- homepage: http://www.cs.utexas.edu/users/ml/clamp/videoDescription/
Video
Dataset | # Videos | # Classes | Year | Manually Labeled ? |
---|---|---|---|---|
Kodak | 1,358 | 25 | 2007 | ✓ |
HMDB51 | 7000 | 51 | ||
Charades | 9848 | 157 | ||
MCG-WEBV | 234,414 | 15 | 2009 | ✓ |
CCV | 9,317 | 20 | 2011 | ✓ |
UCF-101 | 13,320 | 101 | 2012 | ✓ |
THUMOS-2 | 18,394 | 101 | 2014 | ✓ |
MED-2014 | ≈28,000 | 20 | 2014 | ✓ |
Sports-1M | 1M | 487 | 2014 | ✗ |
ActivityNet | 27,801 | 203 | 2015 | ✓ |
FCVID | 91,223 | 239 | 2015 | ✓ |
UCF101 - Action Recognition Data Set
- homepage: http://crcv.ucf.edu/data/UCF101.php
HMDB51: A Large Video Database for Human Motion Recognition
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
- homepage: http://activity-net.org/
- download: http://activity-net.org/download.html
- github: https://github.com/activitynet
Sports-1M
- homepage: https://github.com/gtoderici/sports-1m-dataset/blob/wiki/ProjectHome.md
- github: https://github.com/gtoderici/sports-1m-dataset/
- thumbnails: http://cs.stanford.edu/people/karpathy/deepvideo/classes.html
Charades Dataset
- intro: This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.
- intro: The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos.
- homepage: http://allenai.org/plato/charades/
FCVID: Fudan-Columbia Video Dataset
- homepage: http://bigvid.fudan.edu.cn/FCVID/
YouTube-8M: A Large-Scale Video Classification Benchmark
- homepage: http://research.google.com/youtube8m/
- arxiv: http://arxiv.org/abs/1609.08675
stabilized video frames
- intro: 9 TB, 35,000,000 clips, 32 frames
- intro: Generating Videos with Scene Dynamics
- homepage: http://web.mit.edu/vondrick/tinyvideo/#data
The Kinetics Human Action Video Dataset
- intro: Google
- homepage: https://deepmind.com/research/open-source/open-source-datasets/kinetics/
- arxiv: https://arxiv.org/abs/1705.06950
e-Lab Video Data Set(s)
- intro: “Currently, e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each (see histogram below). We are aiming to collect overall 1750 (50 × 35) videos with your help.”
- homepage: https://engineering.purdue.edu/elab/eVDS
Video Dataset Overview
- intro: Sortable and searchable compilation of video dataset
- arxiv: https://www.di.ens.fr/~miech/datasetviz/
Scene
SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth
- intro: Imperial College London
- project page: https://robotvault.bitbucket.org/scenenet-rgbd.html
- github: https://arxiv.org/abs/1612.05079
- github: https://github.com/jmccormac/pySceneNetRGBD
Autonomous Driving
BDD: Berkely Deep Drive
- intro: 100,000 HD video sequences of over 1,100-hour driving experience across many different times in the day, weather conditions, and driving scenarios
- homepage: http://bdd-data.berkeley.edu/
- github: https://github.com/ucbdrive/bdd-data
OCR
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
- homepage: http://vision.cornell.edu/se3/coco-text/
- arxiv: http://arxiv.org/abs/1601.07140
Chinese Text in the Wild
- intro: 32,285 high resolution images, 1,018,402 character instances, 3,850 character categories, 6 kinds of attributes
- homepage: https://ctwdataset.github.io/
- arxiv: https://arxiv.org/abs/1803.00085
ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views
Retrieval
Oxford5k
Paris6k
Oxford105k
UKB
NUS-WIDE
ImageNet-YahooQA
University-1652:
[Paper] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍]
- Dataset and Baseline Code: https://github.com/layumi/University1652-Baseline
DeepFashion: In-shop Clothes Retrieval
- intro: 7,982 number of clothing items; 52,712 number of in-shop clothes images, and ~200,000 cross-pose/scale pairs; Each image is annotated by bounding box, clothing type and pose type.
- homepage: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/InShopRetrieval.html
Person Re-ID
Dataset | Description |
---|---|
CUHK01 | 971 identities, 3884 images, manually cropped |
CUHK02 | 1816 identities, 7264 images, manually cropped |
CUHK03 | 1360 identities, 13164 images, manually cropped + automatically detected |
Person Re-identification Datasets
- homepage: http://robustsystems.coe.neu.edu/sites/robustsystems.coe.neu.edu/files/systems/projectpages/reiddataset.html
- github: https://github.com/RSL-NEU/person-reid-benchmark
CUHK Person Re-identification Datasets
http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html
PRW (Person Re-identification in the Wild) Dataset
- homepage: http://www.liangzheng.com.cn/Project/project_prw.html
- github: https://github.com/liangzheng06/PRW-baseline
Person Re-identification in the Wild
- intro: CVPR 2017 spotlight
- arxiv: https://arxiv.org/abs/1604.02531
DukeMTMC-reID
- intro: DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset
- intro: 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images
- github: https://github.com/layumi/DukeMTMC-reID_evaluation
DukeMTMC4ReID
- intro: DukeMTMC4ReID dataset
- github: https://github.com/NEU-Gou/DukeReID
Person Re-ID (PRID) Dataset 2011
https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/
MARS (Motion Analysis and Re-identification Set) Dataset
- intro: an extension of the Market-1501 dataset
- homepage: http://www.liangzheng.com.cn/Project/project_mars.html
- github: https://github.com/liangzheng06/MARS-evaluation
X-MARS Reordering of the MARS Dataset for Image to Video Evaluation
- intro: This repository provides the X-MARS dataset splits for image to video/tracklet evaluation
- github: https://github.com/andreas-eberle/x-mars
MSMT17
- intro: 15-camera (12 outdoor cameras, 3 indoor cameras), 4,101 Identities, 126,441 BBoxes
- homepage: http://www.pkuvmc.com/publications/longhui.html
- soa: http://www.pkuvmc.com/publications/state_of_the_art.html
Labeled Pedestrian in the Wild
- intro: train/test identities: 1,975/756
- homepage: http://liuyu.us/dataset/lpw/
SenseReID
https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view
3DPeS
http://www.openvisor.org/3dpes.asp
iQIYI-VID: A Large Dataset for Multi-modal Person Identification
https://arxiv.org/abs/1811.07548
Fashion
Large-scale Fashion (DeepFashion) Database
- intro: Attribute Prediction, Consumer-to-shop Clothes Retrieval, In-shop Clothes Retrieval, and Landmark Detection
- homepage: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
Apparel classification with Style
- intro: 15 clothing classes, 88951 images
- homepage: http://people.ee.ethz.ch/~lbossard/projects/accv12/index.html
Attribute Datasets
Attribute Datasets
- intro: in total 41,585 pedestrian samples, each of which is annotated with 72 attributes as well as viewpoints, occlusions, body parts information
- homepage: https://www.ecse.rpi.edu/homepages/cvrl/database/AttributeDataset.htm
Pedestrian Attribute Recognition
A Richly Annotated Dataset for Pedestrian Attribute Recognition
- homepage: http://rap.idealtest.org/
- arxiv: https://arxiv.org/abs/1603.07054
Pedestrian Attribute Recognition At Far Distance
- intro: PEdesTrian Attribute (PETA)
- homepage: http://mmlab.ie.cuhk.edu.hk/projects/PETA.html
- paper: http://personal.ie.cuhk.edu.hk/~pluo/pdf/mm14.pdf
Market-1501_Attribute
DukeMTMC-attribute
Parse27k
- intro: Pedestrian Attribute Recognition in Sequences
- intro: >27,000 annotated pedestrians, 10 attributes
- homepage: https://www.vision.rwth-aachen.de/page/parse27k
- tools: https://github.com/psudowe/parse27k_tools
Tracking
UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
- homepage: http://detrac-db.rit.albany.edu/
- arxiv: https://arxiv.org/abs/1511.04136
DukeMTMC: Duke Multi-Target, Multi-Camera Tracking Project
- intro: DukeMTMC aims to accelerate advances in multi-target multi-camera tracking. It provides a tracking system that works within and across cameras, a new large scale HD video data set recorded by 8 synchronized cameras with more than 7,000 single camera trajectories and over 2,000 unique identities
- homepage: http://vision.cs.duke.edu/DukeMTMC/
The WILDTRACK Seven-Camera HD Dataset
https://cvlab.epfl.ch/data/wildtrack
GOT-10k: Generic Object Tracking Benchmark
- intro: A large, high-diversity, one-shot database for generic object tracking in the wild
- project page: http://got-10k.aitestunion.com/
- github: https://github.com/got-10k/toolkit
Color Classification
Vehicle Color Recognition on an Urban Road by Feature Context
http://mclab.eic.hust.edu.cn/~pchen/project.html
License Plate Detection and Recognition
Application-Oriented License Plate (AVOP) Database
http://aolpr.ntust.edu.tw/lab/download.html
CCPD: Chinese City Parking Dataset
- paper: http://openaccess.thecvf.com/content_ECCV_2018/papers/Zhenbo_Xu_Towards_End-to-End_License_ECCV_2018_paper.pdf
- github: https://github.com/detectRecog/CCPD
- dataset: https://drive.google.com/file/d/1fFqCXjhk7vE9yLklpJurEwP9vdLZmrJd/view
Face Anti-Spoofing
CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations
- intro: ECCV 2020
- arxiv: https://arxiv.org/abs/2007.12342
- github: https://github.com/Davidzhangyuanhan/CelebA-Spoof
Tools
VoTT: Visual Object Tagging Tool 1.5
- intro: Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos
- github: https://github.com/Microsoft/VoTT
LabelImg: a graphical image annotation tool and label object bounding boxes in images
Pychet Labeller
- intro: A python based annotation/labelling toolbox for images. The program allows the user to annotate individual objects in images.
- github: https://github.com/sbargoti/pychetlabeller
ml-pyxis: Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).
- intro: Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
- github: https://github.com/vicolab/ml-pyxis
Open Image Dataset downloader
BBox-Label-Tool
- intro: A simple tool for labeling object bounding boxes in images
- github: https://github.com/puzzledqs/BBox-Label-Tool
Data Labeler for Video
- intro: A GUI tool for conveniently label the objects in video, using the powerful object tracking.
- github: https://github.com//hahnyuan/video_labeler
Computer Vision Annotation Tool (CVAT)
- intro: Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms
- github: https://github.com/opencv/cvat
Artist
BAM! The Behance Artistic Media Dataset
- intro: 2.5M artwork urls, 393K attribute labels, 74K short image descriptions/captions
- project page: https://bam-dataset.org/
- arxiv: https://arxiv.org/abs/1704.08614
Resources
CV Datasets on the web
http://www.cvpapers.com/datasets.html
Awesome Public Datasets
- intro: An awesome list of high-quality open datasets in public domains (on-going). By everyone, for everyone!
- github: https://github.com/caesar0301/awesome-public-datasets
Machine Learning Repository
https://archive.ics.uci.edu/ml/datasets.html