/imagenetscraper

👁 Bulk-download all thumbnails from an ImageNet synset, with optional rescaling

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

imagenetscraper

PyPI Build Status

Command-line utility for downloading all thumbnail images from an ImageNet synset, optionally rescaling to a different resolution.

Usage

Usage: imagenetscraper [OPTIONS] SYNSET_ID [OUTPUT_DIR]

Options:
  -c, --concurrency INTEGER  Number of concurrent downloads (default: 8).
  -s, --size WIDTH,HEIGHT    If specified, images will be rescaled to the
                             given size.
  -q, --quiet                Suppress progress output.
  -h, --help                 Show this message and exit.
  --version                  Show the version and exit.

If the URL of a synset page looks like:

http://image-net.org/synset?wnid=n00000000
                                 ^^^^^^^^^
                                 SYNSET_ID

SYNSET_ID is the n00000000 part. For example, for the "person, individual, someone, somebody, mortal, soul" synset at http://image-net.org/synset?wnid=n00007846, the corresponding synset id is n00007846.

The default output directory (OUTPUT_DIR) is the current directory.

Examples

To download all thumbnail imagess from the synset mentioned above to the directory "person_images", run:

imagenetscraper n00007846 person_images

To do the same, but with each thumbnail image rescaled to a width of 256 and a height of 128, add --size 256,128:

imagenetscraper n00007846 person_images --size 256,128

To run in "quiet mode", suppressing progress output, add --quiet:

imagenetscraper n00007846 person_images --size 256,128 --quiet

By default, imagenetscraper will download 8 images at once. To change this, use --concurrency:

imagenetscraper n00007846 person_images --size 256,128 --concurrency 4

Install

  1. Install Python 3, pip, and a development version of libjpeg. imagenetscraper is tested with Python 3.4-3.7 and libjpeg-turbo 8.

    sudo apt-get install python3 python3-pip libjpeg-turbo8-dev
    
  2. Download and install imagenetscraper with pip.

    sudo -H pip3 install imagenetscraper
    

Citation

If this tool helped with your research, a citation would be appreciated:

@Misc{imagenetscraper,
author = {Michael Smith},
title = {imagenetscraper: Bulk-download thumbnails from ImageNet synsets},
howpublished = {\url{https://github.com/spinda/imagenetscraper}},
year = {2017}
}

Testing

To run unit tests, use:

python3 setup.py test

License

Copyright (C) 2017-2018 Michael Smith <michael@spinda.net>

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.