Dataset collection for (deep) machine learning and computer vision

This page provides a collection of (image and text) datasets for (deep) machine learning and computer vision problems.

=====Image datasets======

***Dataset for Natural Images******

ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. The creators of the dataset hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.

    • TBA


***Dataset for Sketch images******



Selected papers used the dataset:

Xu, P., Huang, Y., Yuan, T., Pang, K., Song, Y. Z., Xiang, T., … & Guo, J. (2018). Sketchmate: Deep hashing for million-scale human sketch retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8090-8098). (PDF)

Try the demo!

Citation bibtex:

author={Eitz, Mathias and Hays, James and Alexa, Marc},
title={How Do Humans Sketch Objects?},
journal={ACM Trans. Graph. (Proc. SIGGRAPH)},
pages = {44:1--44:10}


We investigate the problem of fine-grained sketch-based image retrieval (SBIR), where free-hand human sketches are used as queries to perform instance-level retrieval of images. This is an extremely challenging task because (i) visual comparisons not only need to be fine-grained but also executed cross-domain, (ii) free-hand (finger) sketches are highly abstract, making fine-grained matching harder, and most importantly (iii) annotated cross-domain sketch-photo datasets required for training are scarce, challenging many state-of-the-art machine learning techniques.In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based image retrieval application. We introduce a new database of 1,432 sketch-photo pairs from two categories with 32,000 fine-grained triplet ranking annotations. We then develop a deep triplet ranking model for instance-level SBIR with a novel data augmentation and staged pre-training strategy to alleviate the issue of insufficient fine-grained training data. Extensive experiments are carried out to contribute a variety of insights into the challenges of data sufficiency and overfitting avoidance when training deep networks for fine-grained cross-domain ranking tasks.

Database   New Dataset! (ShoeV2: 2000 photos + 6648 sketches)


Citation bibtex:

  			title={Sketch Me That Shoe},
  			author={Yu, Qian and Liu, Feng and SonG, Yi-Zhe and Xiang, Tao and Hospedales, Timothy and Loy, Chen Change},
  			booktitle={Computer Vision and Pattern Recognition},

Results Updated: On Shoes dataset, acc.@1 is 52.17%. On Chairs dataset, acc.@1 is 72.16%. Please find further details here (Extra comment 1).

code   |  Demo:   Try the demo!


Citation bibtex:

 author = {Patsorn Sangkloy and Nathan Burnell and Cusuh Ham and James Hays},
 title = {The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies},
 journal = {ACM Transactions on Graphics (proceedings of SIGGRAPH)},
 year = {2016},

The authors present a new dataset of paired images and contour drawings for the study of visual understanding and sketch generation. In this dataset, there are 1,000 outdoor images and each is paired with 5 human drawings (5,000 drawings in total). The drawings have strokes roughly aligned for image boundaries, making it easier to correspond human strokes with image edges.

The dataset is collected with Amazon Mechanical Turk. The Turkers are asked to trace over a fainted background image. In order to obtain high-quality annotations, we design a labeling interface with a detailed instruction page including many positive and negative examples. The quality control is realized through manual inspection by treating annotations of the following types as rejection candidates: (1) missing inner boundary, (2) missing important objects, (3) with large misalignment with original edges, (4) the content not recognizable, (5) drawing humans with stick figures, (6) shaded on empty areas. Therefore, in addition to the 5,000 drawings accepted, we have 1,947 rejected submissions, which can be used in setting up an automatic quality guard.

License: the dataset is licensed under CC BY-NC-SA (Attribution-NonCommercial-ShareAlike). That means you can use this dataset for non-commerical purposes and your adapted work should be shared under similar conditions.


Citation bibtex:

  title={Photo-Sketching: Inferring Contour Drawings from Images},
  author={Li, Mengtian and Lin, Zhe and M\v ech, Radom\'ir and Yumer, Ersin and Ramanan, Deva},


CUHK Face Sketch database (CUFS) is for research on face sketch synthesis and face sketch recognition. It includes 188 faces from the Chinese University of Hong Kong (CUHK) student database, 123 faces from the AR database [1], and 295 faces from the XM2VTS database [2]. There are 606 faces in total. For each face, there is a sketch drawn by an artist based on a photo taken in a frontal pose, under normal lighting condition, and with a neutral expression.

Sketch Dataset


  • TBA


***Datasets for Cartoon Images******

It is extracted from the comic books DCM772 public dataset. This dataset is composed of 772 annotated images from 27 golden age comic books. It is freely collected from the free public domain collection of digitized comic books Digital Comics Museum ( The ground-truth annotations of this dataset contain bounding boxes for panels and comic characters (body + faces), and segmentation masks for balloons, and links between balloons and characters.



  • TBA


***Diagrams Dataset******

  • AI2D — a dataset of illustrative diagrams for diagram understanding (Download the dataset HERE, paper PDF)

AI2D is a dataset of illustrative diagrams for research on diagram understanding and associated question answering. Each diagram has been densely annotated with object segmentations, diagrammatic and text elements. Each diagram has a corresponding set of questions and answers.

Abstract: Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs.

Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H., & Farhadi, A. (2016, October). A diagram is worth a dozen images. In European Conference on Computer Vision (pp. 235-251). Springer, Cham.

Citation bibtex:

author ={Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi},
title={A Digram Is Worth A Dozen Images},
booktitle={European Conference on Computer Vision (ECCV)},
  • TBA

***3D datasets******

Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5828-5839).





The CVPR2019 workshp used the ScanNet dataset:

  • TBA

=====Text datasets======



=====More datasets can be found from the sources below======

For researchers in computer vision & Image processing:

There are  670 + datasets listed on CVonline

Image/video database categories:
Action Databases
Attribute recognition
Autonomous Driving
Camera calibration
Face and Eye/Iris Databases
General Images
General RGBD and depth datasets
General Videos
Hand, Hand Grasp, Hand Action and Gesture Databases
Image, Video and Shape Database Retrieval
Object Databases
People (static), human body pose
People Detection and Tracking Databases (See also Surveillance)
Remote Sensing
Scene or Place Segmentation or Classification
Simultaneous Localization and Mapping
Surveillance (See also People)
Urban Datasets
Other Collection Pages
Miscellaneous Topics



  • WACV (IEEE Winter Conference on Applications of Computer Vision)


  • TBA



[Paper published] Check out our new (deep) machine learning paper for flood detection

New machine/deep learning paper led by Liping: Analysis of remote sensing imagery for disaster assessment using deep learning: a case study of flooding event

A full-text view-only version of the paper can be found via the link:


AI vs. Machine Learning vs. Deep Learning

(Stay tuned, I keep updating this post while I plow in my deep learning garden:))

in category: Machine Learning vs Deep Learning

*****The following slide is from Transfer Learning and Fine-tuning Deep Neural Networks  (Sep 2, 2016 by  Anusua Trivedi, Data Scientist @ Microsoft)

*****The following slide is from  Prof. Andrew Ng’s talk  “Machine Learning and AI via Brain simulations” (PDF) at Stanford University. 

*****The following slide is from the lecture talk  “How Could Machines Learn as Efficiently as Animals and Humans?” (December 12, 2017) given by Prof. Yann LeCun, Director of Facebook AI Research and Silver Professor of Computer Science at New York University.

*****Below is an  excerpt from What is deep learning? (By Jason Brownlee on August 16, 2016)

The core of deep learning according to Andrew is that we now have fast enough computers and enough data to actually train large neural networks. When discussing why now is the time that deep learning is taking off at ExtractConf 2015 in a talk titled “What data scientists should know about deep learning“, he commented:

very large neural networks we can now have and … huge amounts of data that we have access to

He also commented on the important point that it is all about scale. That as we construct larger neural networks and train them with more and more data, their performance continues to increase. This is generally different to other machine learning techniques that reach a plateau in performance.

for most flavors of the old generations of learning algorithms … performance will plateau. … deep learning … is the first class of algorithms … that is scalable. … performance just keeps getting better as you feed them more data

Dr. Andrew Ng provides a nice plot  in his slides:

(Source: Ng, A. What Data Scientists Should Know about Deep Learning (see slide 30 of 34), 2015)

*****The relations between AI, Machine Learning, and Deep Learning

“Deep learning is a subset of machine learning, and machine learning is a subset of AI, which is an umbrella term for any computer program that does something smart. In other words, all machine learning is AI, but not all AI is machine learning, and so forth.” (check here for source.)

Below is a short excerpt from the source: The AI Revolution: Why Deep Learning Is Suddenly Changing Your Life (from By Roger Parloff, Illustration by Justin Metz on SEPTEMBER 28, 2016)

Think of deep learning as a subset of a subset. “Artificial intelligence” encompasses a vast range of technologies—like traditional logic and rules-based systems—that enable computers and robots to solve problems in ways that at least superficially resemble thinking. Within that realm is a smaller category called machine learning, which is the name for a whole toolbox of arcane but important mathematical techniques that enable computers to improve at performing tasks with experience. Finally, within machine learning is the smaller subcategory called deep learning.

A detailed  explanation similar to the nested set diagram above can be found in this post Understanding the differences between AI, machine learning, and deep learning (By Hope Reese | February 23, 2017).

======Below are some main screenshots from this talk: Watch Now: Deep Learning Demystified





References and reading list: