Open Images dataset

Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.

The annotations are licensed by Google Inc. under CC BY 4.0 license. The contents of this repository are released under an Apache 2 license.

The images are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

_{Annotated images from the Open Images dataset. Left: FAMILY MAKING A SNOWMAN by mwvchamber. Right: STANZA STUDENTI.S.S. ANNUNZIATA by ersupalermo. Both images used under CC BY 2.0 license.
Browse the validation set here and the train set here, courtesy of Common Visual Data Foundation (CVDF).}

Announcements

07-20-2017 V2 data is released! The dataset now includes 2M bounding boxes spanning 600 object classes (1.24M in train, 830K in validation+test), and 4.3M human-verified positive image-level labels on the training set. Changelist. Coming soon: Trained models (both image-level and object detectors).
07-20-2017 V2 data visualizer, courtesy of Common Visual Data Foundation (CVDF).

Download the data

Image URLs and metadata (990 MB)
Bounding box annotations (train, validation, and test sets) (37 MB)
Human-verified image-level annotations (train, validation, and test sets) (66 MB)
Machine-generated image-level annotations (train, validation, and test sets) (447 MB)
Classes and class descriptions (293 KB)

See also how to import the annotations into PostgreSQL.

Data organization

The dataset is split into a training set (9,011,219 images), a validation set (41,620 images), and a test set (125,436 images). The validation of V1 set was partitioned into validation and test in the V2 release. This is intended to make evaluations more tractable. The images are annotated with image-level labels and bounding boxes as described below.

Image-level labels

Table 1 shows an overview of the image-level labels in all splits of the dataset. All images have image-level labels automatically generated by a computer vision model similar to Google Cloud Vision API.

Table 1: Image-level labels.

	Train	Validation	Test	# Classes	# Trainable Classes
Images	9,011,219	41,620	125,436	-	-
Machine Generated Labels	78,977,695	512,093	1,545,835	7,870	4,966
Human Verified Labels	9,376,588 pos: 4,303,594 neg: 5,072,994	559,873 pos: 374,778 neg: 185,095	1,693,466 pos: 1,132,567 neg: 560,899	19,661	5,000

Moreover, all images in the validation and test sets as well as part of the training set have human-verified image-level labels. Most of the human verifications have been done with in-house annotators at Google. A smaller part has been done with crowd-sourced verification from Image Labeler: Crowdsource app, g.co/imagelabeler. This human verification process allows to practically eliminate false positives (but not false negatives, so some labels might be missing from an image). A variety of computer vision models were used to generate the samples (not just the one used for the machine-generated labels above) which is why the vocabulary is significantly expanded (#classes column in Table 1).

Overall, there are 19868 distinct classes with image-level labels (19661 have at least one human-verified sample and 7870 have a sample in the machine-generated pool). Of these, 5000 classes are considered trainable, as they have at least 30 human-verified samples in the training set and 5 in the validation or test sets. Classes are identified by MIDs (Machine-generated Ids) as can be found in Freebase or Google Knowledge Graph API. A short description of each class is available in class-descriptions.csv.

Each annotation has a confidence number from 0.0 to 1.0 assigned. Confidences for the human-verified labels are binary (either positive, 1.0 or negative, 0.0). Machine-generated labels have fractional confidences, generally >= 0.5. The higher the confidence, the smaller chance for the label to be a false positive.

Bounding boxes

Table 2 shows an overview of the bounding box annotations in all splits of the dataset, which span 600 object classes. These offer a broader range than those in the ILSVRC and COCO detection challenges, and include new objects such as fedora hat and snowman.

Table 2: Boxes.

	Train	Validation	Test	# Classes	# Trainable Classes
Images	669,124	41,620	125,436	-	-
Boxes	1,240,316	204,621	625,282	600	545

We provide complete bounding box annotation for all object instances on the validation and test sets, all manually drawn by human annotators at Google. This was done for all image-level labels that have been positively verified by a human (see Table 1). Moreover, annotators also marked a set of attributes for each box, e.g. indicating whether that object is occluded. On average, there are about 5 boxes per image in the validation and test sets.

We produced bounding boxes in the training set semi-automatically using an enhanced version of the method described in "We don't need no bounding-boxes: Training object class detectors using only human verification", Papadopolous et al., CVPR 2016. This process iterates between detecting objects, asking humans to verify the current detections (boxes), and updating the detector. This enables to localize difficult objects that were missed in early iterations. We run it for 3-4 iterations, which produced boxes for about 70% of all image-level labels (i.e. those that have been positively verified by a human in Table 1). If there are multiple objects of the same class in one image, only one has been boxed. All boxes have been human verified to have IoU > 0.7 with the perfect box, and in practice they are very accurate (mean IoU around 0.82). We deliberately did not annotate human body parts for 80% of the training set due to the overwhelming number of instances. On average, there are about 2 boxes per image in the training set.

Overall, there are 600 distinct classes with a bounding box attached to at least one image. Of these, 545 classes are considered trainable (the intersection of the 600 boxable classes with the 5000 image-level trainable classes).

Data Formats

The data tarballs contain the following files:

images.csv

There's one such file for each subset inside train, validation, and test subdirectories respectively. It has image URLs, their OpenImages IDs, titles, authors and license information:

ImageID,Subset,OriginalURL,OriginalLandingURL,License,AuthorProfileURL,Author,Title,\
OriginalSize,OriginalMD5,Thumbnail300KURL
...
000060e3121c7305,train,https://c1.staticflickr.com/5/4129/5215831864_46f356962f_o.jpg,\
https://www.flickr.com/photos/brokentaco/5215831864,\
https://creativecommons.org/licenses/by/2.0/,\
"https://www.flickr.com/people/brokentaco/","David","28 Nov 2010 Our new house."\
211079,0Sad+xMj2ttXM1U8meEJ0A==,https://c1.staticflickr.com/5/4129/5215831864_ee4e8c6535_z.jpg
...

Each image has a unique 64-bit ID assigned. In the CSV files they appear as zero-padded hex integers, such as 000060e3121c7305.

The data is as it appears on the destination websites.

OriginalSize is the download size of the original image.
OriginalMD5 is base64-encoded binary MD5, as described here.
Thumbnail300KURL is an optional URL to a thumbnail with ~300K pixels (~640x480). It's provided for the convenience of downloading the data in the absence of more convenient ways to get the images. If missing, the OriginalURL must be used (and then resized to the same size, if needed). Beware: these thumbnails are generated on the fly and their contents and even resolution might be different every day.

annotations-machine.csv

Machine-generated image-level labels (one file each for train, validation, and test):

ImageID,Source,LabelName,Confidence
000002b66c9c498e,machine,/m/05_4_,0.7
000002b66c9c498e,machine,/m/0krfg,0.7
000002b66c9c498e,machine,/m/01kcnl,0.5
000002b97e5471a0,machine,/m/05_5t0l,0.9
000002b97e5471a0,machine,/m/0cgh4,0.8
000002b97e5471a0,machine,/m/0dx1j,0.8
000002b97e5471a0,machine,/m/039jbq,0.8
000002b97e5471a0,machine,/m/03nfmq,0.8
000002b97e5471a0,machine,/m/03jm5,0.7
...

These were generated from a computer vision model similar to Google Cloud Vision API.

annotations-human.csv

Human-verified image-level labels (one file each for train, validation, and test):

ImageID,Source,LabelName,Confidence
000002b66c9c498e,human,/m/014l8n,0
000002b66c9c498e,human,/m/017ycn,0
000002b66c9c498e,human,/m/018tkd,0
000002b66c9c498e,human,/m/019_nn,1
000002b66c9c498e,human,/m/01_qs1,0
000002b66c9c498e,human,/m/01b7zv,0
000002b66c9c498e,human,/m/01bsxb,1
000002b66c9c498e,human,/m/01c5nlx,1
000002b66c9c498e,human,/m/01d07t,0
...

annotations-human-bbox.csv

Human provided labels with bounding box coordinates (one file each for train, validation, and test).

For the train set labels and box coordinates are provided:

ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax
000002b66c9c498e,human,/m/01g317,1,0.602,0.766,0.309,0.489
000002b66c9c498e,human,/m/01g317,1,0.611,0.750,0.287,0.469
000002b66c9c498e,human,/m/01mzpv,1,0.009,0.119,0.750,0.920
000002b66c9c498e,human,/m/01mzpv,1,0.019,0.098,0.767,0.892
000002b66c9c498e,human,/m/0270h,1,0.522,0.917,0.675,0.966
000002b66c9c498e,human,/m/0284d,1,0.522,0.917,0.675,0.966
000002b66c9c498e,human,/m/0284d,1,0.560,0.951,0.696,1.000
000002b66c9c498e,human,/m/02p0tk3,1,0.031,0.448,0.697,0.967
000002b66c9c498e,human,/m/02wbm,1,0.522,0.917,0.675,0.966
...

For the validation and test sets additional attributes are also given:

ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
000e4e7ed48c932d,human,/m/05s2s,1,0.000563,1.000000,0.000000,0.999128,0,1,1,0,0
000e4e7ed48c932d,human,/m/0c9ph5,1,0.822343,1.000000,0.000000,0.222423,1,1,0,0,0
000e4e7ed48c932d,human,/m/0c9ph5,1,0.466559,0.761525,0.124262,0.560279,1,0,0,0,0
000e4e7ed48c932d,human,/m/0c9ph5,1,0.000000,0.082646,0.249817,0.386786,1,1,0,0,0
000e4e7ed48c932d,human,/m/0c9ph5,1,0.000000,0.214165,0.321725,0.683551,0,0,0,0,0
000e4e7ed48c932d,human,/m/0c9ph5,1,0.200481,0.316795,0.615067,0.728066,0,0,0,0,0
000e4e7ed48c932d,human,/m/0c9ph5,1,0.052237,0.229369,0.853620,1.000000,1,1,0,0,0
000e4e7ed48c932d,human,/m/0c9ph5,1,0.553224,0.780531,0.865034,1.000000,1,1,0,0,0
000e4e7ed48c932d,human,/m/0c9ph5,1,0.498488,0.583633,0.483805,0.542017,1,0,0,0,0
...

The attributes have the following definitions:

IsOccluded: Indicates that the object is occluded by another object in the image.
IsTruncated: Indicates that the object extends beyond the boundary of the image.
IsGroupOf: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching.
IsDepiction: Indicates that the object is a depiction (e.g., a cartoon or drawing of the object, not a real physical instance).
IsInside: Indicates a picture taken from the inside of the object (e.g., a car interior or inside of a building).

class-descriptions.csv

The label MIDs can be converted to their short descriptions by looking into class-descriptions.csv:

...
/m/025dyy,Box
/m/025f_6,Dussehra
/m/025fh,Professor x
/m/025fnn,Savannah Sparrow
/m/025fsf,Stapler
/m/025gg7,Jaguar x-type
/m/02_5h,Figure skating
/m/025_h00,Solid-state drive
/m/025_h88,White tailed prairie dog
/m/025_hbp,Mercury monterey
/m/025h_m,Yellow rumped Warbler
/m/025khl,Spätzle
...

Note the presence of characters like commas and quotes. The file follows standard csv escaping rules. E.g.,

/m/02wvth,"Fiat 500 ""topolino"""
/m/03gtp5,Lamb's quarters
/m/03hgsf0,"Lemon, lime and bitters"

classes.txt

The list of 19868 image-level classes:

/m/0100nhbf
/m/0104x9kv
/m/0105jzwx
/m/0105ld7g
/m/0105lxy5
/m/0105n86x
/m/0105ts35
/m/0108_09c
/m/01_097
/m/010dmf
...

classes-trainable.txt

The list of 5000 trainable image-level classes.

classes-bbox.txt

The list of 600 box-level classes.

classes-bbox-trainable.txt

The list of 545 trainable box-level classes.

Statistics and data analysis

Browse the data

Browse the bounding box groundtruth here, courtesy of CVDF.
View the set of boxable labels here:

Label distribution

The following figures show the distribution of annotations across the dataset. Notice that the class distribution is heavily skewed (note: the y-axis is on a log-scale). Labels are ordered by number of positive examples, then by number of negative examples. Green indicates positive examples while red indicates negatives.

Citations

If you use the OpenImages dataset in your work, please cite it as:

APA-style citation: "Krasin I., Duerig T., Alldrin N., Ferrari V., Abu-El-Haija S., Kuznetsova A., Rom H., Uijlings J., Popov S., Veit A., Belongie S., Gomes V., Gupta A., Sun C., Chechik G., Cai D., Feng Z., Narayanan D., Murphy K. OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages".

BibTeX

@article{openimages,
  title={OpenImages: A public dataset for large-scale multi-label and multi-class image classification.},
  author={Krasin, Ivan and Duerig, Tom and Alldrin, Neil and Ferrari, Vittorio and Abu-El-Haija, Sami and Kuznetsova, Alina and Rom, Hassan and Uijlings, Jasper and Popov, Stefan and Veit, Andreas and Belongie, Serge and Gomes, Victor and Gupta, Abhinav and Sun, Chen and Chechik, Gal and Cai, David and Feng, Zheyun and Narayanan, Dhyanesh and Murphy, Kevin},
  journal={Dataset available from https://github.com/openimages},
  year={2017}
}

jkurz/dataset