Combine property pulls into single requests
barrust opened this issue · 3 comments
In order to reduce the load on MediaWiki servers, it would be good to combine as many of the property requests as possible. Things that can be pulled at one time should.
Some possible pit falls:
- Figuring out how to properly use the continue parameter when multiple elements are being returned
- Determining which properties should be combined into a single MediaWiki API request
The best possible outcome would be to pull some of the properties used when pulling the main page information. To do this will require quite a bit of rework but I think it would be a great addition and would reduce the number of calls against the mediaiwiki site.
@barrust, I've prepared MVP of possible solution. You can check it here.
I would like to hear your opinion.
To make code more clear I've decided to create python descriptors in separate module instead of expanding MediaWikiPage
.
MediaWikiPageProperty
I've implemented MediaWikiPageProperty
which is base class for all future page properties like: content
, categories
etc.
Every child of MediaWikiPageProperty
has two functions:
get_query_params
— returns default query_paramsparse_query_data
— gets required value from response
MediaWikiPagePropertyHandler
Above changes made implementation of function get_batch_properties
which may combine query_params
and decrease number of overall queries possible.
combine_query_params
is responsible for combining queries.
This function considers following rules:
- According to wikimedia API only one generator is allowed in one request.
- Function prevents from combining equal properties into single request. E.g.
Content
andSummary
will be separated because both containsprop=extracts
.
Exception: Specifying pages like:titles
,pageids
andrevids
.
@shnela this is very interesting and a very different method than I was thinking. I was imagining a "simpler" (in my mind) solution of just cherry picking which properties are generally pulled together and merge those. It would require a slightly different method of the _continued_query to merge the results into a single dictionary object before parsing.
Something like all the pre-populated properties joined together unless they are generators, in which case, they can't be used together.
Thoughts?