thunlp/ERNIE

Missing numbers using wikiextractor

mpagli opened this issue · 1 comments

Hey,

Thanks for the nice work. I just wanted to point to some open issue of wikiextractor, in case you are not aware of it: attardi/wikiextractor#189

Some numbers are missing in the output. Here is an example:

Andorra is the <a href="European%20microstates">sixth-smallest nation in Europe</a>, having an area of and a population of approximately .

Instead of:

Andorra is the <a href="European%20microstates">sixth-smallest nation in Europe</a>, having an area of 468 square kilometers (181 sq mi) and a population of approximately 77,006

Are the published results based on a wiki corpus with missing numbers or is it a recent bug?

zzy14 commented

I think it is a recent bug. The published results are based on the version in 2018.