/gqn_datasets_translator

Primary LanguagePythonApache License 2.0Apache-2.0

gqn_datasets_translator

With major contributions from versatran01.

Data downloader and data converter for DeepMind GQN dataset https://github.com/deepmind/gqn-datasets to use with other libraries than TensorFlow

Don't hesitate to make a pull request.

Dependencies

You need to install:

  • TensorFlow here
  • gsutil here Note that gsutil works in python 2.* only

Download the tfrecord dataset

If you want to download the entire dataset:

gsutil -m cp -R gs://gqn-dataset/<dataset> .

If you want to download a proportion of the dataset only:

python download_gqn.py <dataset> <proportion>

Convert the raw dataset

Command line options:

usage: convert2file.py [-h] [-b BATCH_SIZE] [-n FIRST_N] [-m MODE]
                       base_dir dataset

Convert gqn tfrecords to gzip files.

positional arguments:
  base_dir              base directory of gqn dataset
  dataset               datasets to convert, eg. shepard_metzler_5_parts

optional arguments:
  -h, --help            show this help message and exit
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        number of sequences in each output file
  -n FIRST_N, --first-n FIRST_N
                        convert only the first n tfrecords if given
  -m MODE, --mode MODE  whether to convert train or test

Convert all records with all sequences in sm5 train (400 records, 2000 seq each):

python convert2file.py ~/gqn_dataset shepard_metzler_5_parts

Convert first 20 records with batch size of 128 in sm5 test:

python convert2file.py ~/gqn_dataset shepard_metzler_5_parts -n 20 -b 128 -m test

Size of the datasets:

Names Sizes
total  1.45 Tb 
-------------  --------------
jaco 198.97 Gb
mazes 136.23 Gb
rooms_free_camera_no_object_rotations 255.75 Gb
rooms_free_camera_with_object_rotations 598.75 Gb
rooms_ring_camera 250.89 Gb
shepard_metzler_5_parts 21.09 Gb
shepard_metzler_7_parts 23.68 Gb