Parsing seems to exclude some part of the page
franluca opened this issue · 0 comments
franluca commented
Thanks for the great library!
I noticed that the resulting entries may miss some meaningful content, e.g.
{"id": "75159532", "revid": "39374154", "url": "https://en.wikipedia.org/wiki?curid=75159532", "title": "Tyszko", "text": "Tyszko is a surname. Notable people with the surname include: "}
is missing the list of notable people.
I'm using standard the command
python -m', wikiextractor.WikiExtractor <dump name> --json -o <output folder>
Am I missing something?
Thanks again,
Luca