[Q&A] Concurrent fetching of stargazer pages?

Question

[Q&A] Concurrent fetching of stargazer pages?

Byron opened this issue 4 years ago · 8 comments

When fetching a project like this it takes a while to obtain all information needed to produce a graph.

When looking at the network activity, it's clear that each page is fetch one after the other:

As the amount of pages are known in advance, it should be possible to fetch multiple pages at the same time, speeding up the graph generation considerably.

What are your thoughts on this?

Edit: Some praise where it is due: I find myself using the start tracking a lot, as it has exactly the features I seek along with perfect usability. Thank you!

Answer 1 · 2020-07-24T07:56:14.000Z

Thank you @Byron for the kind words!
Indeed fetching the pages is done one by one so it might be a good idea to parallelize it.
It will create some challenges around collecting the data in the right order and limiting the number of concurrent connections to avoid GitHub blocking them (especially for repos with many stars).

Do you think you will have the time to look into this?

Answer 2 · 2020-07-24T08:47:36.000Z

Indeed, I think it could be implemented with the desired amount of 'worker' threads which receive their work through some channel - the page to fetch. The result along with the page number would be sent to some reducer which brings the results back into the right order, so a standard fan-out-fan-in should do the job. This way, the amount of concurrent connections would be controlled perfectly.
As the producer runs out of pages, the send part of the channel should be closed to let all workers stop gracefully. This in turn should close the send parts of the results channel, which allows the reducer to know when it's done.

If you could point me at the mechanism to do channels (with the desired semantics) and 'workers' in JavaScript, I would happily give it a shot :).

Thanks again!

Answer 3 · 2020-07-29T08:35:06.000Z

I'm sorry for the late response. The approach you described sounds right to me. Regarding the implementation - there is this concept of web workers which enables real parallelization in client-side Javascript. There are numerous articles and tutorials on how to integrate web workers with React (you can google it, you'll find them easily 😃 ).
However in this case since these jobs are mostly IO intensive I'm not sure this is necessary. We can just send a few requests in parallel and handle the results as they come (in a callback). This callback can also take care of sending the next request in line.
We do need to take care of organizing the results in the right order. But actually, when I think about it, maybe we should first find out if it's even needed - maybe the chart and the rest of the components can handle unordered data also, I'm not very sure.

Let me know if you'd like to try to implement this feature. I can definitely help.

Answer 4 · 2020-08-12T12:15:26.000Z

Sorry for the even later response 😅.
I agree, WebWorkers wouldn't be needed as this work is mostly IO.
Right now I am super busy rewriting git in Rust, but once I am running into 'the solution' to this probably very solved problem, I will share it here or submit that PR that makes it all happen!

Thanks again for making StarTrack, a great alternative to star history, which I believe already does this. (maybe there is some inspiration there, too)

Answer 5 · 2020-08-13T07:51:49.000Z

thanks for the kind words!!

I'm also very busy with other projects right now, but when I find some time I'll take a crack at it. I totally agree we can look at star history and learn how they implemented it. Shouldn't be very difficult.

BTW, nice work on gitoxide! seems like a very interesting (and challenging) project!

Answer 6 · 2021-02-02T09:15:39.000Z

Thanks to @gsaraf who kindly implemented this feature, much appreciated!

I'll close this issue now, please reopen if needed

Answer 7 · 2021-02-02T09:21:08.000Z

Didn't notice this issue! I implemented it in a slightly different way, but the result is more or less the same.

:)

Answer 8 · 2021-02-02T09:26:45.000Z

Didn't notice this issue! I implemented it in a slightly different way, but the result is more or less the same.

Yes I agree. Anyhow your implementation significantly improves the loading time so I think the goal was achieved 🥇