/anime-style-transfer

Style transfer for between real photos and anime images using CycleGAN

Primary LanguagePython

Real world to anime style transfer

This is repository of TensorFlow implementation of CycleGAN: https://arxiv.org/abs/1703.10593 CycleGAN is GAN-like neural network for style transfer, which does not require paired training data.

This implementation is heavily based on Otakar Jašek's diploma thesis

Basically tt uses 2 datasets: real and anime. Real data are from common machine learning datasets for computer vision, namely Ade20k, but combined with various cosplay photographs. Anime data are from anime videos, sampled at 1 FPS.

Code is in code/mod-cycle-gan. code/mod-cycle-gan/data_preparation contains data preparation scripts. The input to neural network is native TensorFlow format, protobuf. Videos are sampled to obtain images and then images are converted into .tfrecord files containing protobuf format of training data.

Images in tfrecord can be corrupted, you can check them with script. python data_preparation/check_tfrecords.py --file=<absolute name of tfrecord>

Neural network is then trained in code/mod-cycle-gan/train.py by feeding it two tfrecord files, one with real data, one with anime data. Trained network then can be used for inference, transforming real images to anime data by code/mod-cycle-gan/transform.py script.

Example for start training:

python3 train.py --batchsize=2 --Ytfr=../../datasets/anime/houseki-no-kuni.tfrecord

if you need to run training on server, in background, you can use run-network-bg.sh script. For example, you can run the same network in background by:

./run-network-bg.sh --batchsize=2 --Ytfr='../../datasets/anime/houseki-no-kuni.tfrecord'

Trained network is stored in .pb files, which contain its very compact, protobuf representation. It is much smaller than checkpoints, so it can be even versioned in git.

Trained networks are stored in export/<network name>/<current training step>

When transforming video, we must split it to images, transform them, and then create video from them. Using this approach, audio is lost, obviously. Commands for that, example:

python data_preparation/videos_to_images.py --videos_dir=../../dataset-sources/real/videos/animefest-2017-cosplay --images_dir=../../dataset-sources/real/images/animefest-2017-cosplay
python transform.py --inpath=../../dataset-sources/real/images/animefest-2017-cosplay/*.png --outdir=../../data/images/animefest-2017-cosplay --includein=0 --rundir=20180625-1659-0
python data_preparation/images_to_videos.py --images_dir=../../data/images/animefest-2017-cosplay/20180625-1659-0/80000 --video_path=../../data/videos/animefest-cosplay.avi

Images extracted from videos take lots of space, and are not needed when threcords are generated so you can delete them.

Results (so far)

Trained on 2 datasets: Ade20k and anime series + movie No Game No Life, I obtained following results on Ade20k dataset (training data):

Image of results

Tried on testing data (not used for training), I obtained interesting results, althour with some slight artifacts. Following images are photos of Czech cosplayer Lena, be sure to check her content (Facebook, Instagram)

Image of results

Unfortunately other training did not show any more interesting results so far.