imgnet_preproc

A repo to download and convert ImageNet-1k dataset to TFRecords.

Objective

The objective of this set of scripts is to give a set of utility scripts which can be used to create TFRecords from the ImageNet-1k dataset. Generally, people new to the dataset spend a lot of time understanding its structure and writing code for this task. So I have made this repo to make it as easy to use as possible.

Points to remember

You need to untar the tar files to a directory. Also, the scripts expect data directories to have a specific structure and names.
When running these scripts on ImageNet-1k dataset it is recommended to use nohup or screen to avoid any mishap due to loss of connection.
IMPORTANT: Delete n02105855/n02105855_2933.JPEG before starting, else the process will crash halfway through.
This is not not intended to be used with gs:// filepaths. I intend to add this functionality in the future.
Please take a look at this doc page from Google Cloud to setup a CLoud VM for this.

Usage:

python3 make_tfrecs.py \
    --odir <output_directory.> \
    --data_dir <Input data directory.> \
    --file_prefix <File prefix to add to all files.> \
    --synset_filepath <Path to JSON file containing synsets. Default file is used if unspecified.> \
    --batch_size <Batch size for the dataset. One shard contains these many examples.> \
    --log_freq <`Writing shard..` will be printed after these many shards.> \
    --shuffle <To be specified if dataset should be shuffled before making TFRecords.> \
    --validation_set <To be specified if dataset has the file structure of ImageNet validation set.>

Description of scripts:

make_tfrecs.py : Driver script
tfrecs_utils.py : Utilities for making tfrecords.
image_utils.py : Utilities for image manipulation.
valid_labels.txt : Labels for validation set of ImageNet-1k.
synset_to_human.json : Synset to human readable labels mapping.

AdityaKane2001/imgnet_preproc

imgnet_preproc

Objective

Points to remember

Usage:

Description of scripts: