github-changelog-generator/github-changelog-generator

Generating changelog is really slow

aelam opened this issue · 3 comments

aelam commented

Describe the bug
A clear and concise description of what the bug is.

When I try to generate changelog for my project, it's really slow after printing out Fetching closed dates for issues and stuck on here, it might reach the rate-limit of our GHE. But I don't know how to check more details

To Reproduce
Steps to reproduce the behavior:

Found 57 tags
Fetching tags dates: 57/57
Sorting tags...
Received issues: 500
Pull Request count: 3193
Filtered pull requests: 339
Fetching events for issues and PR: 339
Fetching closed dates for issues: 339/339

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

The rate limit seems to happen with
GitHubChangelogGenerator::OctoFetcher#fetch_events_async, which calls List issue events for each issue and pull request. If you have many issues and pull requests, it's easy for github-changelog-generator to reach Rate limit. Maybe, this problem could be resolved with GraphQL API, but it requires a lot of work.

As far as I see the code, issue events are only used to detect the closed date by #find_closed_date_by_commit. I suppose that github-changelog-generator can just take merged_at for the closed date from the pull requests response, but if it goes forward with this strategy, this change will be a breaking change.

aelam commented

thanks for your answer!

i'm not sure if there is any different between closed_at and merged_at for PRs, they should be same if I understand it correctly?
if they are not same usually it means the PR is closed without merging, which means this PR will not be included anyway

Yes, closed_at and merged_at for PRs are mostly the same. On the other hand, #find_closed_date_by_commit will take commit.author.date from here and suppose it as issues are closed. commit.author.date and closed_at or merged_at for PRs are different. This is an example that shows the difference.

Screen Shot 2022-09-15 at 9 12 13

rails/rails#45885 is merged on Sep. 10, but the commit date is Aug. 25. github-changelog-generator will label 2022-08-25 for the PR in CHANGELOG.md.

You can get the same result on GitHub GraphQL Explorer with this query.

query {
  repository(name: "rails", owner: "rails") {
    pullRequest(number: 45885) {
      mergedAt
      closedAt
      commits(last: 1) {
        edges {
          node {
            commit {
              id
              author {
                date
                email
                name
              }
            }
          }
        }
      }
    }
  }
}