Missing numbers using wikiextractor
mpagli opened this issue · 1 comments
mpagli commented
Hey,
Thanks for the nice work. I just wanted to point to some open issue of wikiextractor, in case you are not aware of it: attardi/wikiextractor#189
Some numbers are missing in the output. Here is an example:
Andorra is the <a href="European%20microstates">sixth-smallest nation in Europe</a>, having an area of and a population of approximately .
Instead of:
Andorra is the <a href="European%20microstates">sixth-smallest nation in Europe</a>, having an area of 468 square kilometers (181 sq mi) and a population of approximately 77,006
Are the published results based on a wiki corpus with missing numbers or is it a recent bug?
zzy14 commented
I think it is a recent bug. The published results are based on the version in 2018.