[Q&A] Concurrent fetching of stargazer pages?
Byron opened this issue ยท 8 comments
When fetching a project like this it takes a while to obtain all information needed to produce a graph.
When looking at the network activity, it's clear that each page is fetch one after the other:
As the amount of pages are known in advance, it should be possible to fetch multiple pages at the same time, speeding up the graph generation considerably.
What are your thoughts on this?
Edit: Some praise where it is due: I find myself using the start tracking a lot, as it has exactly the features I seek along with perfect usability. Thank you!
Thank you @Byron for the kind words!
Indeed fetching the pages is done one by one so it might be a good idea to parallelize it.
It will create some challenges around collecting the data in the right order and limiting the number of concurrent connections to avoid GitHub blocking them (especially for repos with many stars).
Do you think you will have the time to look into this?
Indeed, I think it could be implemented with the desired amount of 'worker' threads which receive their work through some channel - the page to fetch. The result along with the page number would be sent to some reducer which brings the results back into the right order, so a standard fan-out-fan-in should do the job. This way, the amount of concurrent connections would be controlled perfectly.
As the producer runs out of pages, the send part of the channel should be closed to let all workers stop gracefully. This in turn should close the send parts of the results channel, which allows the reducer to know when it's done.
If you could point me at the mechanism to do channels (with the desired semantics) and 'workers' in JavaScript, I would happily give it a shot :).
Thanks again!
I'm sorry for the late response. The approach you described sounds right to me. Regarding the implementation - there is this concept of web workers which enables real parallelization in client-side Javascript. There are numerous articles and tutorials on how to integrate web workers with React (you can google it, you'll find them easily ๐ ).
However in this case since these jobs are mostly IO intensive I'm not sure this is necessary. We can just send a few requests in parallel and handle the results as they come (in a callback). This callback can also take care of sending the next request in line.
We do need to take care of organizing the results in the right order. But actually, when I think about it, maybe we should first find out if it's even needed - maybe the chart and the rest of the components can handle unordered data also, I'm not very sure.
Let me know if you'd like to try to implement this feature. I can definitely help.
Sorry for the even later response ๐
.
I agree, WebWorkers wouldn't be needed as this work is mostly IO.
Right now I am super busy rewriting git in Rust, but once I am running into 'the solution' to this probably very solved problem, I will share it here or submit that PR that makes it all happen!
Thanks again for making StarTrack, a great alternative to star history, which I believe already does this. (maybe there is some inspiration there, too)
thanks for the kind words!!
I'm also very busy with other projects right now, but when I find some time I'll take a crack at it. I totally agree we can look at star history and learn how they implemented it. Shouldn't be very difficult.
BTW, nice work on gitoxide! seems like a very interesting (and challenging) project!
Thanks to @gsaraf who kindly implemented this feature, much appreciated!
I'll close this issue now, please reopen if needed
Didn't notice this issue! I implemented it in a slightly different way, but the result is more or less the same.
:)
Didn't notice this issue! I implemented it in a slightly different way, but the result is more or less the same.
Yes I agree. Anyhow your implementation significantly improves the loading time so I think the goal was achieved ๐ฅ