A set of scripts useful in deep learning and AI purposes, originally for use with fast.ai
lectures and libraries.
Download images (typically limited to 1000) from a specified serach engine, currently Google or Bing. image_download.py is useful in several respects:
- Because is utilizes selenium, it is not limited by the search engine api and generally allows for more downloaded images.
- It can operate in
headless
mode, which means it can be used on a server without access to a gui browser. - The default browser is Firefox. The script can be modified to use other browsers such as Chrome.
usage: image_download.py [-h] [--gui] [--engine {google,bing}]
searchtext num_images
Select, search, download and save a specified number images using a choice of
search engines
positional arguments:
searchtext Search Image
num_images Number of Images
optional arguments:
-h, --help show this help message and exit
--gui, -g Use Browser in the GUI
--engine {google,bing}, -e {google,bing}
Search Engine, default=google
Example: image_download.py 'dog' 200 --engine 'bing' --gui
Installation:
virtualenv -p python3.6 env
source env/bin/activate
pip install -r requirements.txt
- Install an appropriate browser and browser driver (appropriate for your browser and operating system) in PATH. For example, if using Ubuntu and Firefox:
tar xfvz geckodriver-v0.19.1-linux64.tar.gz
mv geckodriver ~/bin/
, where~/bin
is a dir in PATH
usage: make_train_valid.py [-h] [--train TRAIN] [--valid VALID] [--test TEST]
labels_dir
Make a train-valid directory and randomly copy files from labels_dir to sub-
directories
positional arguments:
labels_dir Contains at least two directories of labels, each containing
files of that label
optional arguments:
-h, --help show this help message and exit
--train TRAIN files for training, default=.8
--valid VALID files for validation, default=.2
--test TEST files for training, default=.0
For example, given a directory:
catsdogs/
..cat/[*.jpg]
..dog/[*.jpg]
make_train_valid.py catsdogs --train .75 --valid .25
Creates the following directory structure:
catsdogs/
..cat/[*.jpg]
..dog/[*.jpg]
..train/
..cat/[*.jpg]
..dog/[*.jpg]
..valid/
..cat/[*.jpg]
..dog/[*.jpg]
Use file
to determine the type of picture then filter (keep) only pictures of a specified type.
Images are filtered in place, i.e., non-JPEG files are deleted. (This can be modified within the script.)
Usage: filter_img image_directory
Example:filter_image dogs/
image_download.py 'bmw' 500 --engine 'bing' --gui
image_download.py 'cadillac' 500 --engine 'google'
mv dataset cars
filter_img.py cars/bmw
filter_img.py cars/cadillac
make_train_valid.py cars --train .75 --valid .25