/ai2thor_image_dataset

Generate 1.3M images with AI2-Thor

Primary LanguagePython

AI2-Thor Image Dataset

Overview

image_dataset.py will generate a 1.3 million image dataset of objects classified by ImageNet synset IDs.

It can be compared to the TDW use case controller single_object.py.

Requirements

  • Ubuntu
  • Python 3.6+
    • ai2thor
    • numpy
    • pillow
    • tqdm

Usage

python3 image_dataset.py [ARGUMENTS]
Argument Type Default Description
--dir str ai2thor_image_dataset Root output directory relative to <home>/
--new If included, delete an existing dataset at the output directory.
--accept_all If included, if any pixels in the segmentation mask are an object's segmentation color, save an image labeled as the object.

How It Works

  • Launch an AI2-Thor controller and initialize a scene.
  • For 100 steps, apply a random movement or rotation action
  • Parse the segmentation mask from the step. For each object occupying >=1% of the pixels in the segmentation mask, save that image.
    • This means that the same image may be saved multiple times, e.g. an image with a desk and a chair can be saved as desk_0000.jpg and chair_0000.jpg.
    • Images are saved to train/ and val/ subdirectories with "wnid" (or ImageNet synset ID) names, using the file object_types.csv, which maps the AI2-Thor object type to wnid (source).
    • The target number of objects per wnid is: total number of images / total number of wnids.
  • After 100 steps, initialize a new scene and repeat.
    • Evaluate which scene types are still valid using data from scenes_and_objects.json. Each key is a AI2-Thor "scene type" and each value is a list of AI2-Thor object types in the scene (source). If we have enough images of each object type in the scene, skip all scenes of that type.
  • image_dataset.py will start reasonably fast while capturing larger objects and then slow down as it becomes harder to find smaller objects, to the point where it would take weeks to generate the whole dataset. To handle this, if it took 3 seconds or more to capture an image for 100 steps, the pixel percent threshold is reduced from 1% to 0% (meaning that if there any pixels with the object's segmentation color appear in the segmentation mask, the image is accepted.)
  • It is possible to stop and restart image_dataset.py. When stopped, it will write a progress.json save file to the dataset output directory. This file contains the number of images per wnid that have been generated so far, and the scene_index, or the index in the array of scenes.

Comparison to TDW (single_object.py)

AI2-Thor is not designed for object image dataset generation, while TDW is designed for many tasks, image dataset generation among them.

AI2-Thor Advantages

  • Writing controller code for AI2-Thor is much easier than TDW; the TDW script is much longer and much more complex.
  • It is trivial to populate an AI2-Thor scene with multiple objects, while populating a scene in TDW would requires custom proc-gen logic.

AI2-Thor Disadvantages

  • TDW can load in arbitrary sets of models, such as models_full.json (2200 models), models_core.json (200 models), or ShapeNet SEM (9000 models). AI2-Thor has a fixed number of models (several hundred models).
  • TDW is capable of distinguishing between different models in the same wnid and AI2-Thor is not. While AI2-Thor often has multiple models in the same category, in the returned output data, there is no distinction. For example, there are three AI2-Thor laptop models: Laptop1, Laptop2, and Laptop3. But in the output metadata, they are all labeled Laptop. In practice, this means that there is usual 1 exemplar model per category in AI2-Thor (sometimes, there are multiple "object types" that can be classified in the same wnid, such as Bathtub and BathtubBasin). TDW can have an arbitrary number of exemplars per category.
  • TDW is capable of sophisticated framing techniques to ensure that an object is a certain size, rotation, etc. AI2-Thor doesn't have controls to arbitrarily position objects, or to scale them, or to set their rotation. This means that in any image generated by AI2-Thor, the tagged object might be very small, mostly occluded, mostly out of frame, etc.
  • AI2-Thor is substantially slower than TDW. A lot of the shortcuts taken in image_dataset.py (such as saving a image multiple times for multiple objects) were implemented because otherwise it would be a matter of weeks before the dataset was generated, rather than a matter of days. Causes for the slowdown include:
    • AI2-Thor messages are sent via HTTP while TDW messages are sent via lower-level TCP sockets.
      • According to its own benchmarks, AI2-Thor can achieve a maximum of 240 FPS. TDW can achieve 850 FPS in a similar test.
      • Image capture (which is always much slower than just metadata) in AI2-Thor is not benchmarked, but TDW can capture images at up to 300 FPS.
    • AI2-Thor allows only one action per step, while TDW allows an arbitrarily long list of actions per step. For example, single_object.py (TDW) sends 20 or so commands to frame an object in a single step. To implement anything like this in AI2-Thor, it would have to sent the 20 commands at 20 steps (each of which are already slower than a TDW step).
    • TDW output data is far more customizable than AI2-Thor. AI2-Thor returns all object metadata, avatar metadata, no matter what; it can toggle image data on and off but only when initializing a new scene. TDW by default doesn't return any meaningful output data; output data can be toggled on and off at any arbitrary step. This is used in single_object.py (TDW) to speed up various processes; the script will periodically disable image data (a very slow process in both AI2-Thor and TDW) when it is not needed.
    • AI2-Thor's minimum render size is 300x300. All images must therefore be scaled to 256x256 before being written to disk. TDW can render at 256x256 (as well as basically any other resolution).
    • TDW allows for much greater control over which object is in the scene and how many images of it will be captured. Because it is not possible to know ahead of time exactly which objects are in an AI2-Thor scene, image_dataset.py (AI2-Thor) must "hunt around" to find smaller objects across multiple scenes in order to generate images in that category; this searching results in a lot of unused frames and time.

Other differences

  • Per its name, single_object.py (TDW) loads one object into a scene at a time. AI2-Thor always populates a scene with multiple objects.
  • Because it's not possible in AI2-Thor to specify which objects should be loaded, any of the wnids with multiple object types won't have a uniform number of images per model. For example, Bathtub and BathtubBasin are both in the same wnid, but there may more images of one than the other. The image number is also incremented through the wnid as opposed to the model, such that there won't be both a Bathtub_0000.jpg and BathubBasin_0000.jpg. In single_object.py (TDW) the number of images per model per wnid is always equal and the counter increments per model.
  • TDW has superior render quality. As of right now, we don't know how this affects ImageNet transfer results (for example, ShapeNet SEM models seem to yield better results than TDW's native models, despite being lower-quality).
  • single_object.py (TDW) varies the image background by choosing a combination of scene and HDRI skybox. single_object.py has exactly one scene, while multi_env.py has six scenes per dataset (several hundred scene/skybox combinations total); some of the scenes are indoor and some are outdoor. AI2-Thor has several hundred indoor scenes and no skybox variants. As of right now, we don't know which approach to background variability yields better transfer results, or to what extent background variability matters at all.