/reference_based_sketch_image_colorization

PyTorch implementation of the paper "Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence" (CVPR 2020)

Primary LanguagePython

reference_based_sketch_image_colorization

PyTorch implementation of the paper "Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence" (CVPR 2020)

Dependencies

  • Pytorch
  • torchvision
  • numpy
  • PIL
  • OpenCV
  • tqdm

Usage

  1. Clone the repository
  • git clone https://github.com/Snailpong/style_transfer_implementation.git
  1. Dataset download
  • Tag2Pix (filtered Danbooru2020): Link
  • You need to change the script 'danbooru2018' to 'danbooru2020' (can be changed)
  • In my experiment, I used about 6000 images filtered by python preprocessor/tagset_extractor.py
    • I stopped the process when 0080 folder was finished downloading.
  1. Sketch image generation
  • XDoG: Link
  • For automatic genration, I edited main function as follows:
if __name__ == '__main__':
  for file_name in os.listdir('../data/danbooru/color'):
      print(file_name, end='\r')
      image = cv2.imread(f'../data/danbooru/color/{file_name}', cv2.IMREAD_GRAYSCALE)
      result = xdog(image)
      cv2.imwrite(f'../data/danbooru/sketch/{file_name}', result)
  • folder structure example
.
└── data
    ├── danbooru
    |   ├── color
    |   |   ├── 7.jpg
    |   |   └── ...
    |   └── sketch
    |       ├── 7.jpg
    |       └── ...
    └── val
        ├── color
        |   ├── 1.jpg
        |   └── ...
        └── sketch
            ├── 1.jpg
            └── ...
  1. TPS transformation module
  • TPS: Link
  • Place thinplate folder to main folder
  1. Train
  • python train.py
  • arguments
    • load_model: True/False
    • cuda_visible: CUDA_VISIBLE_DEVICES (e.g. 1)
  1. Test
  • python test.py

  • arguments

    • image_path: folder path to convert the images
    • cuda_visible

Results

SketchReferenceResult

Observation & Discussion

  • In Eq. (1), I could not scale the number of activation map, instead I scaled activation map into .
  • In Eq. (5), I implemented the negative region as same region in different batches since the negative region is ambiguous.
  • In Eq. (9), since is unclear in contrast to Eq. (8), I computed style (gram) loss with relu5_1 activation map.
  • In this experiment, there was little difference in quality with or without the similarity-based triplet loss. After convergence from 20 to 0 from 1 epoch, there was little change.
  • When the test image was predicted every 1 epoch after the content loss was converged, the color quality difference was remarkable.
  • The converged adversarial losses of the generator and discriminator were 0.7 ~ 0.8 and 0.15 ~ 0.2, respectively.

Code Reference