seladb/StarTrack-js

Stats missing recent changes

omry opened this issue · 17 comments

omry commented

https://seladb.github.io/StarTrack-js/#/preload?r=facebookresearch,hydra
Looking at the end of the graph, the last point is at around 1600 even though the project is now close to 1650:
image

Maybe you need to zoom in a little more?

I opened the same repo and I do see 1648 stars:

Screen Shot 2020-02-04 at 5 19 20 PM

omry commented

nope, zooming didn't help.
in fact I am still seeing the same number now. is there any caching that might be causing this?

yes, that might be the case. Can you please try in a different browser or in private/incognito mode?

omry commented

Yes. a private session shows the correct graph.
This is interesting because even in the regular session there is still a significant loading time.
Probably some bug lurking here.

I will check it out. Please let me know if you see it happening more

omry commented

beyond somehow flushing my cache, this is happening right now constantly for me.
I can't get the latest data unless I go incognito.

Can you open your browser's debug tools, go to "Network" and post all the requests that are going to GitHub API?

omry commented

Overall:
image
One of the requests looks like:

image

Let me know if you need anything else.

Thanks for sharing this information!

Did you capture this information today? I see your repo has more than 1700 stars so it should have been calling github API 18 times, but it makes only 16 calls and all of them are cached.

Maybe you can clear the browser's cache and check again?

omry commented

Yes, my browser still shows 1600 starts (unless in incognito).
I am not sure it's a good idea to clear the cache before you understand the problem a bit more. it may take a week or so before it will manifest again (if it's not solved already somehow).

Are you sure you want tme to clear the cache?

I think I might have an idea on what's going on. Can you please send me the response header of the first page? Meaning, the response headers for:

https://api.github.com/repos/facebookresearch/hydra/stargazers?per_page=100&page=1

Here's what I'm thinking:

When calling GitHub API you get back a response header called "Link" which contains the last "page". This basically says how many API calls you need to make to fetch all of the star data. For example: if a repo has 1050 stars and each call fetches data on 100 stars you need to make 11 API calls. What I suspect is that since your GitHub API responses are cached, you might not have the most fresh data and that's why StarTrack doesn't fetch all the pages.

If you provide the data about the first page I'll be able to confirm that

omry commented

Unfortunately I lost access to my laptop and will only regain it sometime in the middle of next week, so I cant test this. it seems plausible.

My screen shot shows that this response is cached.

Sure no worries, please send it when you can. The response is indeed cache however in the response that you posted I don't see the "Link" header. Maybe this header is coming only on the first "page"

omry commented

I am already seeing some discrepancy on my home desktop.
Two ideas:

  1. Add caching timeout for to the query for number of pages.
  2. Add an even shorter query timeout for the last page, allowing it to get the latest stats more often.

can you send me a screenshot of the first API request?

omry commented

I will when I get a new repro. because I have been switching laptops in the past few days I don't have one.