[Paper published] Check out our new computer vision and image analysis paper for skeleton extraction

New CVPR 2019 Workshop paper for computer vision and image analysis led by Liping has been published:

A Novel Algorithm for Skeleton Extraction From Images Using Topological Graph Analysis. 

A PDF of the paper can be found HERE. (Check HERE if it is not retrievable on the http://openaccess.thecvf.com) [Acceptance rate < 10/32 = 31.25%]

4. Yang, L. and Worboys, M. Generation of navigation graphs for indoor space. International Journal of Geographical Information Science, 29(10): 1737-1756, 2015. [Click here (PDF) to download a draft of this paper]


Check out this page for Liping’s more publications.

WACV (IEEE Winter Conference on Applications of Computer Vision)


  • Winter Conference on Applications of Computer Vision (WACV) is the IEEE’s and the PAMI-TC’s premier meeting on applications of computer vision.
  • WACV is a full-fledged IEEE conference which covers all areas of computer vision.
  • It is always held in USA.

Current and Past WACV Conferences

  • WACV 2020 (held March, 2020 at The Westin Snowmass Resort in Snowmass village, Colorado, USA)


  • WACV 2019 (held January, 2019 at Hilton Waikoloa Village, Hawaii, USA)



  • WACV 2018 (held March, 2018 at Harvey’s Casino in Lake Tahoe, NV/CA)

      • Sponsors

  • WACV 2017 (held on March 27-29, 2017 at the Hyatt Vineyard Creek Hotel, Santa Rosa, CA, USA)


  • WACV 2016 (held March 7-9, 2016 at the Crowne Plaza resort in Lake Placid, NY, USA)



  • WACV 2015 (held  January 6-9, 2015 at Waikoloa Beach Marriott Resort & Spa, Big Island, Hawaii)

      • WACV2015 Main conference (People)

      • Acceptance rate

        • Paper acceptance statistics:
          425 paper submitted
          156 accepted (35 in round 1, 121 in round 2)
          36.7% accept rate


  • TBA

Dataset collection for (deep) machine learning and computer vision

This page provides a collection of (image and text) datasets for (deep) machine learning and computer vision problems.

=====Image datasets======

***Dataset for Natural Images******

ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. The creators of the dataset hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.

    • TBA


***Dataset for Sketch images******



Selected papers used the dataset:

Xu, P., Huang, Y., Yuan, T., Pang, K., Song, Y. Z., Xiang, T., … & Guo, J. (2018). Sketchmate: Deep hashing for million-scale human sketch retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8090-8098). (PDF)

Try the demo!

Citation bibtex:

author={Eitz, Mathias and Hays, James and Alexa, Marc},
title={How Do Humans Sketch Objects?},
journal={ACM Trans. Graph. (Proc. SIGGRAPH)},
pages = {44:1--44:10}


We investigate the problem of fine-grained sketch-based image retrieval (SBIR), where free-hand human sketches are used as queries to perform instance-level retrieval of images. This is an extremely challenging task because (i) visual comparisons not only need to be fine-grained but also executed cross-domain, (ii) free-hand (finger) sketches are highly abstract, making fine-grained matching harder, and most importantly (iii) annotated cross-domain sketch-photo datasets required for training are scarce, challenging many state-of-the-art machine learning techniques.In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based image retrieval application. We introduce a new database of 1,432 sketch-photo pairs from two categories with 32,000 fine-grained triplet ranking annotations. We then develop a deep triplet ranking model for instance-level SBIR with a novel data augmentation and staged pre-training strategy to alleviate the issue of insufficient fine-grained training data. Extensive experiments are carried out to contribute a variety of insights into the challenges of data sufficiency and overfitting avoidance when training deep networks for fine-grained cross-domain ranking tasks.

Database   New Dataset! (ShoeV2: 2000 photos + 6648 sketches)


Citation bibtex:

  			title={Sketch Me That Shoe},
  			author={Yu, Qian and Liu, Feng and SonG, Yi-Zhe and Xiang, Tao and Hospedales, Timothy and Loy, Chen Change},
  			booktitle={Computer Vision and Pattern Recognition},

Results Updated: On Shoes dataset, acc.@1 is 52.17%. On Chairs dataset, acc.@1 is 72.16%. Please find further details here (Extra comment 1).

code   |  Demo:   Try the demo!


Citation bibtex:

 author = {Patsorn Sangkloy and Nathan Burnell and Cusuh Ham and James Hays},
 title = {The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies},
 journal = {ACM Transactions on Graphics (proceedings of SIGGRAPH)},
 year = {2016},

The authors present a new dataset of paired images and contour drawings for the study of visual understanding and sketch generation. In this dataset, there are 1,000 outdoor images and each is paired with 5 human drawings (5,000 drawings in total). The drawings have strokes roughly aligned for image boundaries, making it easier to correspond human strokes with image edges.

The dataset is collected with Amazon Mechanical Turk. The Turkers are asked to trace over a fainted background image. In order to obtain high-quality annotations, we design a labeling interface with a detailed instruction page including many positive and negative examples. The quality control is realized through manual inspection by treating annotations of the following types as rejection candidates: (1) missing inner boundary, (2) missing important objects, (3) with large misalignment with original edges, (4) the content not recognizable, (5) drawing humans with stick figures, (6) shaded on empty areas. Therefore, in addition to the 5,000 drawings accepted, we have 1,947 rejected submissions, which can be used in setting up an automatic quality guard.

License: the dataset is licensed under CC BY-NC-SA (Attribution-NonCommercial-ShareAlike). That means you can use this dataset for non-commerical purposes and your adapted work should be shared under similar conditions.


Citation bibtex:

  title={Photo-Sketching: Inferring Contour Drawings from Images},
  author={Li, Mengtian and Lin, Zhe and M\v ech, Radom\'ir and Yumer, Ersin and Ramanan, Deva},


CUHK Face Sketch database (CUFS) is for research on face sketch synthesis and face sketch recognition. It includes 188 faces from the Chinese University of Hong Kong (CUHK) student database, 123 faces from the AR database [1], and 295 faces from the XM2VTS database [2]. There are 606 faces in total. For each face, there is a sketch drawn by an artist based on a photo taken in a frontal pose, under normal lighting condition, and with a neutral expression.

Sketch Dataset


  • TBA


***Datasets for Cartoon Images******

It is extracted from the comic books DCM772 public dataset. This dataset is composed of 772 annotated images from 27 golden age comic books. It is freely collected from the free public domain collection of digitized comic books Digital Comics Museum (https://digitalcomicmuseum.com). The ground-truth annotations of this dataset contain bounding boxes for panels and comic characters (body + faces), and segmentation masks for balloons, and links between balloons and characters.



  • TBA


***Diagrams Dataset******

  • AI2D — a dataset of illustrative diagrams for diagram understanding (Download the dataset HERE, paper PDF)

AI2D is a dataset of illustrative diagrams for research on diagram understanding and associated question answering. Each diagram has been densely annotated with object segmentations, diagrammatic and text elements. Each diagram has a corresponding set of questions and answers.

Abstract: Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs.

Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H., & Farhadi, A. (2016, October). A diagram is worth a dozen images. In European Conference on Computer Vision (pp. 235-251). Springer, Cham.

Citation bibtex:

author ={Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi},
title={A Digram Is Worth A Dozen Images},
booktitle={European Conference on Computer Vision (ECCV)},
  • TBA

***3D datasets******

Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5828-5839).





The CVPR2019 workshp used the ScanNet dataset:

  • TBA

=====Text datasets======



=====More datasets can be found from the sources below======

For researchers in computer vision & Image processing:

There are  670 + datasets listed on CVonline

Image/video database categories:
Action Databases
Attribute recognition
Autonomous Driving
Camera calibration
Face and Eye/Iris Databases
General Images
General RGBD and depth datasets
General Videos
Hand, Hand Grasp, Hand Action and Gesture Databases
Image, Video and Shape Database Retrieval
Object Databases
People (static), human body pose
People Detection and Tracking Databases (See also Surveillance)
Remote Sensing
Scene or Place Segmentation or Classification
Simultaneous Localization and Mapping
Surveillance (See also People)
Urban Datasets
Other Collection Pages
Miscellaneous Topics







  • WACV (IEEE Winter Conference on Applications of Computer Vision)


  • TBA



Computer Vision (CV) Resources: CV Terms and Algs

This page provides some fundamental and essential computer vision (CV) related terms, concepts, and algorithms.


In computing, indexed color is a technique to manage digital images‘ colors in a limited fashion, in order to save computer memory and file storage, while speeding up display refresh and file transfers. It is a form of vector quantization compression.

When an image is encoded in this way, color information is not directly carried by the image pixel data, but is stored in a separate piece of data called a palette: an array of color elements. Every element in the array represents a color, indexed by its position within the array. The individual entries are sometimes known as color registers. The image pixels do not contain the full specification of its color, but only its index in the palette. This technique is sometimes referred as pseudocolor[1] or indirect color,[2] as colors are addressed indirectly.

Perhaps the first device that supported palette colors was a random-access frame buffer, described in 1975 by Kajiya, Sutherland and Cheadle.[3][4] This supported a palette of 256 36-bit RGB colors.


An image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, title or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.

The first microcomputer-based image database retrieval system was developed at MIT, in the 1990s, by Banireddy Prasaad, Amar Gupta, Hoo-min Toong, and Stuart Madnick.[1]

A 2008 survey article documented progresses after 2007.[2]

CBIR — the application of computer vision to the image retrieval. CBIR aims at avoiding the use of textual descriptions and instead retrieves images based on similarities in their contents (textures, colors, shapes etc.) to a user-supplied query image or user-specified image features.

List of CBIR Engines – list of engines which search for images based image visual content such as color, texture, shape/object, etc.

Further information: Visual search engine and Reverse image search

Image collection exploration is a mechanism to explore large digital image repositories. The huge amount of digital images produced every day through different devices such as mobile phones bring forth challenges for the storage, indexing and access to these repositories. Content-based image retrieval (CBIR) has been the traditional paradigm to index and retrieve images. However, this paradigm suffers of the well known semantic gap problem. Image collection exploration consists of a set of computational methods to represent, summarize, visualize and navigate image repositories in an efficient, effective and intuitive way.[1]


Automatic summarization consists in finding a set of images from a larger image collection that represents such collection.[2] Different methods based on clustering have been proposed to select these image prototypes (summary). The summarization process addresses the problem of selecting a representative set of images of a search query or in some cases, the overview of an image collection.



Computer Vision (CV) resources

This page provides some useful resources about computer vision (CV).

(Stay tuned, as I will update the content on this  page while I plow and grow in my deep learning garden:))

  • Books

  • Szeliski, R. (2010). Computer vision: algorithms and applications. Springer Science & Business Media. (book website; PDF link) .    [MIT, UCBerkeley, Princeton CV course primary textbook]
  • Forsyth, D. A., & Ponce, J. (2012). Computer Vision: A Modern Approach (2nd Edition). Prentice Hall. The first edition with the same book title was published on 2003.     [MIT, UCBerkeley CV course secondary textbook]
  • Prince, S. J. (2012). Computer vision: models, learning, and inference (book websitePDF link). Cambridge University Press.
  • Fisher, R. B., Breckon, T. P., Dawson-Howe, K., Fitzgibbon, A., Robertson, C., Trucco, E., & Williams, C. K. (2014). Dictionary of computer vision and image processing. John Wiley & Sons.
  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge university press.
  • Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. MIT Press. Cambridge, Massachusetts. (book website)
  • Bishop, C. (2006).Pattern Recognition and Machine Learning, Springer. (book website, PDF link)
  • Wilson, J. N., & Ritter, G. X. (2000). Handbook of computer vision algorithms in image algebra (2nd edition). CRC press.
  • Top and major CV conferences 

  • Top and major CV journals

  • IEEE PAMI (Transactions on Pattern Analysis and Machine Intelligence) —  (1979 – present, monthly) [SCI indexed] (the best journal in computer vision)
  • IJCV (International Journal of Computer Vision)– (1987 – present) [SCI indexed; Springer]

(IJCV is typically considered on par with T-PAMI which is the best journal in computer vision. Review time should be ~4-5 months for first set of reviews and ~12-14 mo for final publication. A good way to get an estimate of time would be to browse through the recently published articles here –they list submitted date and publication date)


References and Further Reading List