question about your dataset number?

Question

question about your dataset number?

Johnson-yue opened this issue 4 years ago · 2 comments

Hi, good job and thank your sharing, but I have some question about your dataset number?

482 chinese fonts, a total of 19,514 characters？ avg should be = 19514/482 ~ 40 characters/fonts but why you said is 6654？

Answer 1 · 2020-09-24T07:54:19.000Z

@Johnson-yue
Hi, thanks for your interest and the very first question for our paper.

We did not mean "19,514 characters" as the number of images (or glyphs),
but the number of real characters, i.e., Unicode.

Thus, each font has 6,654 images on average, and the union of the Unicode in the train set is 19,514.
On the other hand, the total number of "images" in the train set is 6,654 * 482 = 3,207,228.

Please don't hesitate to bother me if you have any other questions.

Answer 2 · 2020-09-24T08:48:57.000Z

Hi， I am making the lmdb file， sorry i am late。
the making lmdb is very slow， every 6734 “image” cost 544 s。
As your lmdb ， 482 fonts * 6654 unicode， how much the size of file？？

You mean that Unicode that can be used for each font is different. 482 fonts contains 19,514 characters , Yes I understand it, Thank you

btw, I tested the AGIS-net， it is impossible to reimplement their paper performance , by their github repo . After I asked two question, they close the issue...... And thanks for your reply