awei669/VQ-Font

Dataset Preparation

Closed this issue · 7 comments

I am confused about dataset preparation. In Data Preparation, you have mentioned structure as given in the image.

image

But in one of the issues, you have mentioned the following structure. Can you help me with this?

image

Sorry for the confusion, the structure in the readme is correct, that is, in the first stage of VQ-VAE pre-training, only 3000 training characters in the content font are used for training, and the remaining 500 characters are used to test generalization. In the second stage of training font generation, there is no need to divide training characters and test characters in the training set and test set. They are divided by (train_unis and val_unis) when generating tran.json.

So while preparing lmdb, what should be passed to --content_font. Images generated from train_unis.json or trian_val_all_characters.json

python3 build_meta4train.py
--saving_dir ../results/chinese_dataset/
--content_font ../datasets/images/content_font/
--train_font_dir ../datasets/images/train
--val_font_dir ../datasets/images/val
--seen_unis_file ../meta/train_unis.json
--unseen_unis_file ../meta/val_unis.json

The second stage of making the lmdb content font directory should contain 3500 (train+val) characters, that is ../datasets/images/content_font/train_val/.

I got it. Thank you so much for your time and prompt replies.

尊敬的作者你好,能否提供一下你的数据集呢?非常感谢

@Djs-Champion 你好,由于版权的原因,我无法直接提供数据集。

请仔细阅读Readme当中的Data Preparation部分。lmdb的构建可参考 issue #6