thoth-station/mi

Unable to collect Issues from repository

oindrillac opened this issue ยท 18 comments

Describe the bug
Unable to collect Issues from repository.

To Reproduce
Steps to reproduce the behavior:

  1. Setup personal access token as env variable
  2. Run python -m srcopsmetrics.cli --create-knowledge --is-local --repository openshift/origin --entities Issue
  3. See error
oindrillachatterjee@Oindrillas-MBP mi % python -m srcopsmetrics.cli --create-knowledge --is-local --repository openshift/origin --entities Issue
INFO:srcopsmetrics.github_knowledge:Overall repositories found: 1
INFO:srcopsmetrics.bot_knowledge:######################## Analysing openshift/origin ########################

INFO:srcopsmetrics.bot_knowledge:########################
INFO:srcopsmetrics.bot_knowledge:Detected entities:
CodeFrequency # Commit # DependencyUpdate # Fork # Issue # IssueEvent # KebechetUpdateManager # License # PullRequest # PullRequestDiscussion # ReadMe # Release # Stargazer # TrafficClones # TrafficPaths # TrafficPaths # TrafficReferrers # TrafficClones # TrafficViews
INFO:srcopsmetrics.bot_knowledge:########################
INFO:srcopsmetrics.bot_knowledge:Issue inspection
INFO:srcopsmetrics.entities.tools.storage:Loading knowledge locally
INFO:srcopsmetrics.entities.tools.storage:Data from file %s loaded
INFO:srcopsmetrics.entities.interface:No previous knowledge of type Issue found
INFO:srcopsmetrics.iterator:-------------Issue Analysis-------------
WARNING:srcopsmetrics.iterator:403 {"message": "API rate limit exceeded for user ID 32435206.", "documentation_url": "https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"}
WARNING:srcopsmetrics.iterator:Problem occured, cached data will be saved
INFO:srcopsmetrics.iterator:Nothing to store, no update operation needed
INFO:srcopsmetrics.bot_knowledge:

Expected behavior
Should fetch Issues for openshift/origin repo

cc: @chauhankaranraj

Hello @oindrillac, thanks for feedback!

GITHUB_ACCESS_TOKEN must be specified as an environment variable, with generated access token (has limit of 5,000 requests). This was somehow lost in documentation as I see now...

GITHUB_ACCESS_TOKEN must be specified as an environment variable, with generated access token (has limit of 5,000 requests).

Thanks @xtuchyna, so actually we were able to figure that bit out, and we set the GITHUB_ACCESS_TOKEN env var to our github personal access token. With this, we were able to get PullRequest data (first screenshot), but not Issues data (second screenshot).

Screenshot from 2022-02-02 18-23-00

Screenshot from 2022-02-02 19-49-24

Any ideas what we might be missing here?

note: i stopped the first command after a few seconds, so that the github api rate limit is not exhausted by the time i start running the second command.

@chauhankaranraj what version of mi are you using?
also, do you use the token for anything else, or is it your private personal token?

@xtuchyna I was running this on the 2.0.1 version of srcopsmetrics.

Also I was using a personal access token that I had created just for this use case.

@xtuchyna I was running this on the 2.0.1 version of srcopsmetrics.

Also I was using a personal access token that I had created just for this use case.

Hey @oindrillac, please update mi to the newest version 2.10.2, the 2.0.1 is nearly two years old and there can be a lot of issues

Updated srcopsmetrics to 2.10.2 and I get the same error with both PullRequest and Issue entity

$ python -m srcopsmetrics.cli --create-knowledge --is-local --repository openshift/origin --entities Issue      
INFO:srcopsmetrics.github_knowledge:Overall repositories found: 1
INFO:srcopsmetrics.bot_knowledge:######################## Analysing openshift/origin ########################

INFO:srcopsmetrics.bot_knowledge:########################
INFO:srcopsmetrics.bot_knowledge:Detected entities:
INFO:srcopsmetrics.bot_knowledge:########################
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.9/site-packages/srcopsmetrics/cli.py", line 215, in <module>
    cli(auto_envvar_prefix="MI")
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/srcopsmetrics/cli.py", line 178, in cli
    analyse_projects(repositories=repos, is_local=is_local, entities=entities_args)
  File "/usr/local/lib/python3.9/site-packages/srcopsmetrics/bot_knowledge.py", line 83, in analyse_projects
    raise NotKnownEntitiesError(
srcopsmetrics.exceptions.NotKnownEntitiesError: ('Invalid specified entities: %s', ['Issue'])

@chauhankaranraj do you see the same log on this version of srcopsmetrics?

Patch is on the way, just waiting it to be delivered onto PyPI #520
(original issue #509 )

Okay @oindrillac , patch delivered (sadly we experienced some minor deployment issues that delayed it)
please update to 2.10.4 and let me know if the issue persists

thanks @xtuchyna. I updated to the latest version srcopsmetrics-2.10.4 and I no longer see the above issue in the new version. However, when I am trying to get the Issues entity, it gets stuck again as before. The same problem doesnt occur with the PullRequest entity.

image

This command has been stuck since a long time
image

goern commented

/kind bug
/priority critical-urgent
/assign @xtuchyna

Just wanted to update here, that fetching the issues seems to be working for smaller repos. I was able to download issues for the open-services-group/metrics repo. So this above issue on the openshift/origin repository seems something to do with the large size of the repository

image

Hey @oindrillac , sorry for stale issue. This should be fixed by #547
Also, there's new entity called RawIssue and RawPullRequest that can be tried (it is basically a non-restrictive data aggregation of such entities)
Please let me know if the issue still persists or not (if not, feel free to close the issue) :)

Hello @oindrillac , did the patch work?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@sesheta: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.