Left : Generated Image / Right : Ground Truth
This is a project that trains GAN-based model with the human handwriting, and generates the character images that reflect their styles. Before learning human handwriting, it should be pre-trained on a large amount of digital font character images, and then it does transfer learning with small amounts of human handwritten character images.
All details about this project can be seen in the blog post. (in Korean)
The basic model architecture is GAN, which consists of Generator and Discriminator.
- Generator gets Gothic type image for input, and do the style transfer with it. It has Encoder and Decoder inside, which is the different point from Vanilla GAN. Generator improves the quality of generated image during evaluated by Discriminator.
- Discrinimator gets Real or Fake images, and calculate the probability of them to be the real image. At the same time, it also predicts the category of the font type.
It is 3D image of Encoder and Decoder. After the Encoder extracts features of image, the font category vector is concatenated at the end of the feature vector. Also, the middle-step extracted vectors goes to the pair-vectors which are decoded by Decoder. This architecture is U-Net.
Pre-Training processes are inspired and helped by zi2zi project of kaonashi-tyc.
[Pre-Training] Data : 75,000 images / 150 epoch
At first, the model trains 150epoch from the scratch.
- 1~30epoch :
L1_penalty=100
,Lconst_penalty=15
- 31~150epochh :
L1_penalty=500
,Lconst_penalty=1000
Until 30epoch, where is early stage yet, we give more weight to L1 loss to let the model learn overall shape first. After that, constant loss will be more weighted to make model learn more details and make them sharper. Constant loss has introduced in DTN.
[Transfer Learning] Data : 210 images / 550 epoch
150epoch Pre-trained model now learns human handwriting. GIF shows the process of learning from 151epoch to 550epoch. It is lot more epochs, but it takes much shorter because of little amount of data.
The upper image is Ground Truth written by human, and the lower image is generated fake image.
All 13 Korean characters written in image are not contained in the training data set. It represents that model can generate unseen characters even if it has been trained with only part of all Korean character set.
Interpolation is the experiment to explore the latent space which model learned, which has been introduced in DCGAN. The GIF shows that there are middle-font between one type of the font and another. It is the evidence that model has trained the category vector space properly, not just 'memorizing' characters.
common
├── dataset.py # load dataset
├── function.py # deep learning functions : conv2d, relu etc.
├── models.py # Generator(Encoder, Decoder), Discriminator
├── train.py # model Trainer
└── utils.py # data pre-processing etc.
get_data
├── font2img.py # font.ttf -> image
└── package.py # .png -> .pkl
Code derived and rehashed from:
- zi2zi by kaonashi-tyc
- tensorflow-hangul-recognition by IBM
- pix2pix-tensorflow by yenchenlin
- Domain Transfer Network by yunjey
- ac-gan by buriburisuri
- dc-gan by carpedm20
- origianl pix2pix torch code by phillipi
Apache 2.0