Implement dynamic chunk size and retries for getting wikipedia extracts
Closed this issue · 0 comments
Reason (Why?)
Get Wikipedia extracts currently has a fixed chunk size for requesting wikipedia extracts that is set to 20. If the Wikipedia api does not respond correctly the script aborts.
However the maximum chunk size as stated in the wikipedia api is 50. Setting the chunk size to 20 was an effort to avoid exceptions, but exceptions still occur occasionally.
Solution (What?)
The chunk size should be determined automatically. The script should try at first to start with a size of 50, decrease the size by a set margin if that fails and start the specific operation anew with that lower chunk size.
This should not be repeated indefinitely but instead use a retry limit after which the script aborts with an error.
Acceptance criteria
If the Wikipedia api returns an error due to the chunk size beeing too big the script should try again with a lower chunk size instead of aborting immediately.