Dataset Preparation

Question

Dataset Preparation

Closed this issue 8 months ago · 7 comments

I am confused about dataset preparation. In Data Preparation, you have mentioned structure as given in the image.

But in one of the issues, you have mentioned the following structure. Can you help me with this?

Answer 1 · 2024-01-17T15:21:32.000Z

Sorry for the confusion, the structure in the readme is correct, that is, in the first stage of VQ-VAE pre-training, only 3000 training characters in the content font are used for training, and the remaining 500 characters are used to test generalization. In the second stage of training font generation, there is no need to divide training characters and test characters in the training set and test set. They are divided by (train_unis and val_unis) when generating tran.json.

Answer 2 · 2024-01-18T03:34:09.000Z

So while preparing lmdb, what should be passed to --content_font. Images generated from train_unis.json or trian_val_all_characters.json

python3 build_meta4train.py
--saving_dir ../results/chinese_dataset/
--content_font ../datasets/images/content_font/
--train_font_dir ../datasets/images/train
--val_font_dir ../datasets/images/val
--seen_unis_file ../meta/train_unis.json
--unseen_unis_file ../meta/val_unis.json

Answer 3 · 2024-01-18T04:05:24.000Z

The second stage of making the lmdb content font directory should contain 3500 (train+val) characters, that is ../datasets/images/content_font/train_val/.

Answer 4 · 2024-01-18T04:30:19.000Z

I got it. Thank you so much for your time and prompt replies.

Answer 5 · 2024-03-11T12:07:06.000Z

尊敬的作者你好，能否提供一下你的数据集呢？非常感谢

Answer 6 · 2024-03-11T13:25:39.000Z

@Djs-Champion 你好，由于版权的原因，我无法直接提供数据集。

请仔细阅读Readme当中的Data Preparation部分。lmdb的构建可参考 issue #6 。

Answer 7 · 2024-03-16T05:44:31.000Z

你好，请问ipynb文件中的路径是写content里面的还是分别写train和val里面的这是我的目录结构：

…

------------------ 原始邮件 ------------------ 发件人: "awei669/VQ-Font" ***@***.***>; 发送时间: 2024年3月11日(星期一) 晚上9:26 ***@***.***>; ***@***.******@***.***>; 主题: Re: [awei669/VQ-Font] Dataset Preparation (Issue #9) @Djs-Champion 你好，由于版权的原因，我无法直接提供数据集。请仔细阅读Readme当中的Data Preparation部分。lmdb的构建可参考 issue #6 。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>