psf/gh-migration

Issue (re)numbering

Closed this issue · 9 comments

GitHub uses the same namespace for issues and PRs, and the current PR numbers already overlap with the original bpo numbers.

Current situation:

  • bpo issues (as of 2020-10-23):
    • open: 7608
    • closed: 46258
    • total: 53866
  • Used bpo ranges:
    • 1000-42000+ (~7500 open, ~40800 total, mostly contiguous)
    • 207608-1779871 (178 open, 12914 total, non-contiguous, old SourceForge issues)
  • Used GitHub range (PRs):
    • 1-22500+ (~1400 open, ~21500 closed, contiguous)

Questions and issues:

  • Is there a way to preserve the bpo numbering?
    • Can we renumber PRs instead?
    • Can we separate the namespace for issues and PRs?
    • Does GitHub offer other solutions/options?
  • Should we renumber old SourceForce issues?
    • could be condensed in a continuous block of ~13k issues
    • can be placed just before/after the bpo block
  • If we renumber the issues, what pattern should we follow?
    • 1-23k PRs, 87k-100k old SF issues (condensed, renumberd), 101k-142k bpo issues (original_number + 100k), >143k new GH issues/PRs
    • 1-23k PRs, 27k-40k old SF issues, 41k-82k bpo issues (original_number + 40k), >82k new GH issues/PRs
    • 1k-42k bpo issues (original_number), 43k-56k old SF issues, 57k-80k current PRs (if PRs can be renumbered), >80k new GH issues/PRs
    • other options?
    • Almost-to-scale representation of the three patterns, for the visually-inclined folks:
      0          23k 27k   41k                     87k    101k                  143k
      |PRPRPRPRPRP|_________________________________|SFSFSF|BPOBPOBPOBPOBPOBPOBPO|NEW...|
      |PRPRPRPRPRP|__|SFSFSF|BPOBPOBPOBPOBPOBPOBPO|NEW...|
      _|BPOBPOBPOBPOBPOBPOBPO|SFSFSF|PRPRPRPRPRP|NEW...|
       1k                   43k    57k         80k 
      
      Every char is ~2k issues. PR: current PRs; SF: old SourceForge issues; BPO: current BPO issues; NEW...: new issues; _: unused range.

Other considerations:

  • The numbers of all new issues/PRs will follow the highest number in the repo (max(issue_ids) + 1) [confirm with GH].
  • If we renumber bpo issues by using original_number + 40k or original_number + 100k it will be easier to find the corresponding issue without relying on a mapping.
  • Trying to do the same for old SF issues is probably not worth it (they are scattered over a wide range and have high IDs).
  • There are references to existing bpo issues number that might need to be updated if the issues are renumbered.
  • Updating issue references in other issues can be done while migrating.
  • Updating issue references in code comments might not be necessary (people can find the bpo issue and from there the corresponding GH issue).
  • If bpo is kept alive, a link to the corresponding GH issue can be added to the bpo issue, if not, a redirect script that maps old and new ids should be used.

Update

  • After talking with GitHub, it appears that is not possible to import the issues in our current repo using the current tools, but they should be imported in a separate repo instead.
  • This solves the problem with the (re)numbering (except for the SF issues, that should probably be renumbered and condensed).
  • It is also possible to redirect users elsewhere using "issue templates" (e.g. VueJS uses this issue template file to redirect users to different pages -- see documentation here.)
  • ❓ Can the redirect be automated? Will issues/PR references still work?

So old issues will be in an different repo. But will we be able to create new issues in the cpython repo? (I hope so, so at least for new issues things look “normal”.)

Ideally it would be better to have all issues -- both old and new -- in the same place. If it's not possible to have them in the python/cpython repo, it might be better to have them all in a separate repo (e.g. python/cpython-issues). I'm investigating with GitHub what will be the consequences of having issues and PRs in two separate repos, and I already thought about a few potential problems/solutions:

  • Issue/PR references default to the current repo, so we will either need to specify the correct repo explicitly (e.g. python/cpython#1) or use different prefixes (e.g. GH- for issues and PR- for PRs) and bots/action to fix the URL to point to the correct repo.
  • It is possible to create a link in the issue tab that points to the correct repo (e.g. what the vuejs repo does), but redirecting automatically to the right repo when the user clicks on the "Issues" tab will save a click and make the experience more seamless. I don't know yet if there is a way to do it.
  • The issue repo will also have to link back to the PR repo, ideally directly from the "Pull requests" tab.
  • If we are going to use projects/milestones, we might need to track both PRs and issues, and having them in two different repos might create other problems.

It might also be possible to merge the two repos down the line, so I'm also investigating if this is a realistic possibility and if there is anything we can do to make this easier/possible in the future.

FTR I looked at the last 50 issues on bpo sorted for activity: more than 1/3 have been created over 1 year ago, and more than 1/5 are over 3 years old. If we keep old issues in a separate repo we will have to do back and forth for a long time.

isn't it possible to transfer the issues from one github repo to another later?

It should be possible, and in fact we are planning to import to an empty test repo first, and then transfer the issues to the python/cpython repo (it is not possible to import into an existing repo). I still have to do some testing to verify this and make sure we can preserve the issue ID while transferring issues.

FTR I just verified that after an import, creating a new issue starts from max(issue_ids) + 1. We probably want to renumber the old SF issues so that we don't end up with 7-digits issue ids.

make sense

The transfer tool doesn't support preserving issue IDs and will just assign IDs incrementally starting from the highest PR id plus one. In theory we could still try to create a fake issue with e.g. ID 99999 so that the first imported issue will have ID 100000 and a fixed offset, but this is very error prone and probably not worth it.

If possible, it would be better to at least preserve the assumption that the ID order matches the creation order.

The issues will be migrated by creation date, and will take the first available ID.

This means that:

  • 1 to ~32k are the exiting PRs
  • ~32k to ~45k will be the old SF issues
  • ~45k to ~86k will be the bpo issues
  • ~86k+ will be new issues/PRs

There will also be links from the GitHub issues to the bpo issues and vice versa.


I also checked if the issue IDs matched the creation date and found a few exceptions among the old SF issues were an issue with lower ID has a more recent creation date than one with a higher ID:
222588 2000-11-16.14:15:39
222589 2000-11-16.14:11:41

233790 2001-02-23.18:02:27
400503 2000-06-06.02:40:44

404275 2001-02-26.13:10:42
404276 2001-02-26.13:10:25

406292 2001-03-06.13:46:02
406293 2001-03-06.13:45:37

406295 2001-03-06.13:46:17
406296 2001-03-06.13:45:52

406297 2001-03-06.13:46:25
406298 2001-03-06.13:46:15

406298 2001-03-06.13:46:15
406299 2001-03-06.13:46:07

406301 2001-03-06.13:46:57
406302 2001-03-06.13:46:50

406304 2001-03-06.13:48:34
406305 2001-03-06.13:48:13

406311 2001-03-06.13:49:20
406312 2001-03-06.13:48:55

406318 2001-03-06.13:56:07
406319 2001-03-06.13:56:00

406321 2001-03-06.13:56:47
406322 2001-03-06.13:56:28

406324 2001-03-06.13:57:10
406325 2001-03-06.13:57:03

431772 2001-06-10.07:55:33
431848 2001-06-09.03:52:00

432208 2001-06-11.21:24:15
432247 2001-06-10.11:06:31

There were also 6 more pairs that were created at the same time and had the same creation date:
494620 2001-12-18.15:32:58
494622 2001-12-18.15:32:58

515026 2002-02-08.22:22:16
515027 2002-02-08.22:22:16

2710 2008-04-28.19:44:50
2711 2008-04-28.19:44:50

2758 2008-05-04.17:42:06
2759 2008-05-04.17:42:06

5632 2009-03-31.21:01:37
5633 2009-03-31.21:01:37

8696 2010-05-12.14:27:09
8697 2010-05-12.14:27:09

that is great