Mediawiki.allpages(limit=999999999) capped at 500 (?)
peanutbutterandcrackers opened this issue · 4 comments
Hello there,
I need to get all the pages for a given wiki instance. I tried
en_wikipedia.allpages(limit=999999999999)
(no query specified - in hopes of having ALL the pages returned) and yet it only returns 500 values (len()
).
Is it possible to get ALL the pages for a given wiki, please? Perhaps limit=-1
or limit=None
to return the entirety of page listing?
Thank you for the really easy-to-use module!
Sadly, that is a limitation set by the wiki; the library passes the value you set and tries to pull all the data. If the wiki provides it then it is returned.
It could be possible to make it so that a user could do the work using the apfrom
value.
@barrust - I see. Thank you.
Could the change be made to allow one to use apfrom
, please? It would be really helpful.
Edit: It seems the apfrom
is the query. I might have to play around with it, then. Is there any other way to scrape all the sub-URLs of a given (wiki) URL that I could look into, that you'd recommend?
Update: I think I might have figured something out: setting query
to the final item of the 500-item list and calling the function again seems to be doing the trick. Thank you very much for this super neat module, again! :)
Glad it is working for you! If you are scraping that much information, I highly recommend setting your own user agent string!
I see. Thank you very much. I will do so.