attardi/wikiextractor

Option to remove blank pages?

AngledLuffa opened this issue · 1 comments

Recent versions of the Wikipedia dumps have blank pages in them. For example, the English one as of 2023-02-01 now starts with AccessibleComputing, which is a redirect to "Computer accessibility". This results in a blank page in the extracted wikipedia:

<doc id="10" url="https://en.wikipedia.org/wiki?curid=10" title="AccessibleComputing">
AccessibleComputing



</doc>

Is it possible to eliminate those, perhaps as an option?