Speed up scraping with concurrent requests

Question

Speed up scraping with concurrent requests

Closed this issue a year ago · 2 comments

Currently, when we fetch the calendar for every week in the semester, every single one of those requests begins only after the previous one has completed. Similarly, when we grab general course data for each course we're enrolled in, those also happen one after the other.

There should be a (not too hard) way to make all of these requests send concurrently. To generate the calendar for one semester, we generally need to make 1 sample week+7 courseinfos (may vary depending on schedule)+15 individual weeks=23ish fetch requests.
If we find a way to turn these 23 sequential requests instead into 3 batches of concurrent requests, that would likely shorten scraping time down to one eighth (~3/23) of the current time.

I think there's something we can do with Promise.all(). I'll do some reading

Answer 1 · 2023-03-24T04:15:03.000Z

Okay, so I implemented this (commit fe065cd), but for some reason it gives only a very trivial speedup (if at all).

Before concurrency:

After concurrency:

Now the question is why these new, concurrent requests are taking so long to complete. @doggu and I have a hypothesis: it's MyU's fault; they seem to only be able to handle one request at a time over on their end. Is this so? How would we know? I'm still holding out hope that there's a way we can nontrivially speed up these fetch requests.

Answer 2 · 2023-06-09T09:08:31.000Z

Going to close this as this feature seems to be added to the recent Spring 2023 feature additions.