jstrieb/github-stats

Stats don't account for "old" contributions

BitPatty opened this issue · 5 comments

The query used to get the users contributions apparently only includes repositories to which the user has recently contributed to.

{
  viewer {
    repositoriesContributedTo(first: 100, includeUserRepositories: false, contributionTypes: [COMMIT, PULL_REQUEST, REPOSITORY, PULL_REQUEST_REVIEW]) {
      nodes {
        nameWithOwner
      }
    }
  }
}

Tested via https://docs.github.com/en/free-pro-team@latest/graphql/overview/explorer

It appears that older contributions have to be queried seperately somehow, for example by scraping the users profile, by abusing the search api or via third party tools such as BigQuery: https://stackoverflow.com/a/63427144

Hi, thanks for using the project and taking the time to open this issue!

I'm afraid I don't completely understand the problem you mention. As far as I can tell, I have implemented the API query using pagination (via after) so that if there are more than 100 results, it will continue to loop through until there are none left. It should do this using the GraphQL query here (in particular line 153):

repositoriesContributedTo(
first: 100,
includeUserRepositories: false,
orderBy: {{
field: UPDATED_AT,
direction: DESC
}},
contributionTypes: [
COMMIT,
PULL_REQUEST,
REPOSITORY,
PULL_REQUEST_REVIEW
]
after: {"null" if contrib_cursor is None else '"'+ contrib_cursor +'"'}
) {{

Have you been finding that it is not working properly? Or has there been some other misunderstanding? I would appreciate more information so that I can better address this. Thanks!

It's not an issue with your code rather than limitations of the Github API itself. Your query generally works fine, however, if you last contributed to a repository you don't own more than ~ 1 year ago it won't show up in the response.

Sample Query for BigQuery:

SELECT distinct repo.name
FROM (
  SELECT * FROM `githubarchive.year.2019`
)
WHERE (type = 'PushEvent' 
  OR type = 'PullRequestEvent')
  AND actor.login = 'BitPatty'

In this case the following repository will show up: https://github.com/zenware/FizzBuzz which has some contributions from my side.

However, on the GraphQL API this repository doesn't show up, since my last contribution was back in 2019.

Github API response:

"nodes": [
  {
    "nameWithOwner": "vendure-ecommerce/vendure"
  },
  {
    "nameWithOwner": "kimeggler/spotifystatistics"
  },
  {
    "nameWithOwner": "HelveticSpeedrunners/speedrun.ch"
  },
  {
    "nameWithOwner": "swisscom/backman"
  },
  {
    "nameWithOwner": "dizzypenguins/Bonobo"
  }
]

Thanks for the clarification! If I'm understanding correctly, there isn't much I can do about this without potentially making a lot of queries to the REST API. Even then, I am not sure that would totally address the problem, given that there are sometimes weird inaccuracies.

Do you think that adding a note to the second paragraph of the disclaimer referring to this specific issue is sufficient to make users aware of the problem? If not, how would you go about fixing it?

Yes, it's certainly a huge effort to adjust the logic for this specific issue. I might get working on it myself if I find enough time to do so - but not in the near future.

Do you think that adding a note to the second paragraph of the disclaimer referring to this specific issue is sufficient to make users aware of the problem? If not, how would you go about fixing it?

Updating the docs would definitely help future users which might be as confused as I was at the beginning about the missing contributions.

In the end, this issue is more of a "FYI" than something I'd want to be "fixed" asap.

That makes sense, thanks! I've mentioned this in the appropriate place in the README and linked back to this issue, which I will leave open. Once again, I appreciate you bringing this to my attention.