This is a from-scratch PyTorch implementation of "Colorful Image Colorization" [1] by Zhang et al. created for the Deep Learning in Data Science course at KTH Stockholm.
The following sections describe in detail:
- how to install the dependencies necessary to get started with this project
- how to colorize grayscale images using pretrained network weights
- how to train the network on a new dataset
- how to run colorization programatically
We recommend you use Anaconda to create a virtual enviroment in which to install the modules needed to run this program, i.e you should run:
conda env create --file environment.yml
There are some extra dependencies needed to run some of the scripts but you should install those manually when it becomes necessary as you may not need them.
In addition, you will need some files provided by R. Zhang, these include the
points of the ab gamut bins used to discretize the image labels and pretrained
Caffe models. Run resources/get_resources.sh
to download these automatically.
If you skip this step you will not be able to run the network at all, even if
you provide your own weight initialization or want to train from scratch.
You might also notice another shell script, data/get_cval.sh
, that downloads
several additional resources. However, this is mainly a remnant of the
developement process and you can safely ignore it.
In order to use the pretrained weights for prediction, you will have to convert
them from Caffe to PyTorch. We provide the convenience script
scripts/convert_weights
for exactly this purpose. In order to use it you will
have to install the caffe
Python module (if you want to convert one of the
Caffe models provided by R. Zhang)
For example, in order to convert the Caffe model trained with class rebalancing
downloaded by resources/get_resources.sh
, you can call the script like this:
./scripts/convert_weights vgg PYTORCH_WEIGHTS.tar \
--weights resources/colorization_release_v2.caffemodel \
--proto resources/colorization_deploy_v2.prototxt
Which will save the converted PyTorch weights to PYTORCH_WEIGHTS.tar
.
The easiest way to colorize several grayscale images of arbitrary size is to
place them in the same directory and colorize them in batch mode using
scripts/convert_images
. For example, if you have placed the images in
directory dir1
and subsequently run:
./scripts/convert_images predict_color \
--input-dir dir1 \
--output-dir dir2 \
--model-checkpoint PYTORCH_WEIGHTS.tar \
--gpu \
--verbose
The script will colorize all images in dir1
on the GPU and place the results
in dir2
(with the same filenames). You can choose an annealed mean
temperature parameter other then the default 0.38 with --annealed-mean-T
. .
If you intend to train the network on your own dataset, you might want to use
the convenience scripts scripts/prepare_dataset
to convert it into a form
suitable for training. For example, if all your images are stored in a
directory tree similar to this one:
dir1/
├── subdir1
│ ├── img1.JPEG
│ ├── img2.JPEG
│ └── ...
├── subdir2
│ ├── img1.JPEG
│ ├── img2.JPEG
│ └── ...
└── ...
you may want to run:
./scripts/prepare_dataset dir1 \
--flatten \
--purge \
--clean \
--file-ext JPEG \
--val-split 0.2 \
--test-split 0.1 \
--resize-height 256 \
--resize-width 256 \
--verbose
The script will first recursively look for images files with the extension
.JPEG
in dir1
and remove all other files and those images that cannot be
read or converted to RGB. It will then resize all remaining images to 256x256
and randomly place them in the newly created subdirectories train
, val
and
test
using a 70/20/10 split.
Note that this will take a while for large datasets since every single image
has to be read into memory. If your images already have the desired size (this
does not necessarily have to be 256x256, the network is fully convolutional and
can train on images of arbitrary size) and you are sure that none of them are
corrupted, you don't have to use the --resize-height/--resize-width
and
--clean
arguments which will speed up the process considerably.
To train the network on your dataset you can use the script
scripts/run_training
. The script accepts command line arguments that control
e.g. the duration of the training and where/how often logfiles and model
checkpoints are written. More specific settings like dataloader configuration,
network type and optimizer settings need to be specified via a configuration
file which is essentially a nested directory of Python objects converted to
JSON. Most likely you will want to use config/default.json
and provide
specific settings or override some defaults in a separate JSON file. See
config/vgg.json
for an example.
Once you have decided on a configuration file you can run the script as follows:
./scripts/run_training \
--config YOUR_CONFIG.json \
--default-config config/default.json \
--data-dir dir1 \
--checkpoint-dir YOUR_CHECKPOINT_DIR \
--log-file YOUR_LOG_FILE.txt \
--iterations ITERATIONS \
--iterations-till-checkpoint ITERATIONS_TILL_CHECKPOINT \
--init-model-checkpoint INIT_MODEL-CHECKPOINT.tar
This will recursively merge the configurations in YOUR_CONFIG.json
and
config/default.json
and then train on the the images in dir1
for
ITERATIONS
iterations (batches). Every ITERATIONS_TILL_CHECKPOINT
iterations, an intermediate model checkpoint will be written to
YOUR_CHECKPOINT_DIR
. Specifying --init-model-checkpoint
is optional but
useful if you want to finetune the network from some pretrained set of weights.
You can also continue training from an arbitrary training checkpoint using the
--continue-training
flag which will load network weights and optimizer state
from INIT_MODEL_CHECKPOINT.tar
(which has to be a checkpoint created by a
previous run of scripts/run_training
) and pick the training up from the last
training iteration (thus ITERATIONS
still specifies the total number of
training iterations).
Colorizing images programmatically using our implementation is very simple. You first need to instantiate the network itself:
from colorization import ColorizationNetwork
network = ColorizationNetwork(annealed_mean_T=0.38, device='gpu')
The parameters should be self explanatory (and are in this case optional), use
device='cpu'
if you plan to run the network on the CPU.
You will then need to wrap the network in an instance of ColorizationModel
which implements (among other things) checkpoint saving/loading:
from colorization import ColorizationModel
model = ColorizationModel(network)
model.load_checkpoint('YOUR_CHECKPOINT_DIR/checkpoint_final.tar')
In order to colorize a grayscale image you should then:
- load it into a numpy array
- resize a copy of it to 224x224 (this is not strictly necessary but produces better results)
- convert it to a torch tensor
- pass it through the model
- reassemble the result
All of this is already implemented in a convenience function:
from colorization predict_color
from skimage.io import imread
img = imread('YOUR_IMAGE.jpg')
img_colorized = predict_color(model, img)
[1] Colorful Image Colorization, Zhang, Richard and Isola, Phillip and Efros, Alexei A, in ECCV 2016 (website)