Map bpo issue metadata to GitHub fields/labels
Closed this issue · 7 comments
This issue is about issue metadata (priority, versions, status, etc.), how/where to import them in GitHub, and what metadata to keep/add/remove/update. User/comment/file metadata will be discussed in a separate issue.
bpo tracks different metadata for each issue (see e.g. https://bugs.python.org/issue2771 ) including: title, comments, files (attachments), creator, creation, actor, activity, type, stage, components, versions, status, resolution, dependencies, superseder, assigned to, nosy list, priority, keywords, remote HG repos, linked PRs
The meaning of each field is explained in the devguide. The fields are defined in the schema.py of the bpo instance. The creator, creation (datetime), (last) actor, (last) activity (datetime) are common to all classes.
-
GitHub already has corresponding fields for the followings: title, messages (comments), linked PRs, assigned to (assignees), creator (user) and creation (created_at).
- bpo stores messages as a list of id on the issue, GitHub has a separate list of comments linked to the issue
- GitHub issues have a body that contains the first comment
- Linked PRs seem to be generated automatically at runtime, not at import/export time
-
❓ Does GitHub have fields for (last) actor, (last) activity (datetime)? Do we need them?
- ✔️ there is an updated_at field (datetime), but no last actor. We probably don't need the last actor.
The other fields will need to be replaced with something else (mostly labels) or removed.
Labels in GitHub can be grouped either with colors, and/or with a prefix like priority-high
, priority-medium
, priority-low
. GitHub is working on adding custom fields, but they will be available in ~6 months.
Actions can be used to automate certain tasks in addition or instead of bots (e.g. adding labels, closing stale issues, etc.).
Unused metadata that are not converted to labels (or anything else) can be stored in a comment so that can be retrieved if needed (e.g. if we move away from GH).
On the python/cpython there are currently 32 labels:
- 5 stage labels (yellow), apparently set by bedevere-bot:
awaiting change review
,awaiting changes
,awaiting core eview
,awaiting merge
,awaiting review
- 6 type-related (blue/red) labels:
type-bugfix
,type-documentation
,type-enhancement
,type-performance
,type-security
,type-tests
- 5 version-related (gray) labels for backports (used by bots):
needs backport to 3.6
-3.10
- 5 more labels used by bots:
automerge
,DO-NOT-MERGE
,skip issue
,skip news
,test-with-buildbots
- 2 CLA-related labels (used by bots):
CLA not signed
,CLA signed
- 2 OS-related labels:
OS-mac
,OS-windows
- 7 more misc labels:
invalid
,ctypes
,dependencies
,expert-asyncio
,spam
,sprint
,stale
This is the full list of all the fields we have in Roundup, and how we could convert them to GitHub Issues:
- creator, creation, activity
- title
- The exporter creates an event when the title has been updated
- comments
- The exporter exports comment author, content, and date.
- See #3 for more info on the msg content.
- files (attachments)
- Files will still be hosted on bpo
- The exporter will create direct links to the files
- assigned to
- The exporter sets the Assignees field and creates events when the assignee changes
- linked PRs
- These can not be imported and the list can't be populated automatically
- PRs are now listed in the table at the top of each imported issue
- nosy list
- To replace the nosy list users can (un)subscribe to individual issues, and can be @mentioned.
- The nosy list users are listed/mentioned in the table at the top of each issue, but this doesn't affect subscriptions.
- ❓ How can we preserve the initial nosy list? @mention all nosy list users in the first message?
✅ it's possible to subscribe people to the issue without sending out any notification when the issue are imported, and enabling notification afterwards so that they will get updates.- ❌ Subscribing other people is not possible, but it might be possible to retrigger mentions by editing the imported messages to have them notified.
- #12 might also help
- ❓ How can we replace the nosy autocomplete?
- ✅ probably not possible, but GitHub suggests reviewers and there is a CODEOWNERS file
- ❓ Can we automatically add people when a certain label is added?
- ✔️ this is now possible, see #16
- dependencies
- ❓ What options do we have to track dependencies with GitHub? (Projects might be one way, but they are probably overkill for simpler cases -- other ways?)
- ❌
currently there is no built-in support for dependencies, GitHub might add it later. - ✔️ It is now possible to add a checkbox list of issues, and GitHub will track them as tasks (won't enforce closing all the dependencies before closing the issue though)
- ❌ this doesn't work in tables, so either we list them in a table as a plain list with no checkboxes, or the list of deps should be moved after the table. Since these are
bpo-xxxxx
issues, even if they are moved after the table the checkboxes won't be updated automatically.
- ❌ this doesn't work in tables, so either we list them in a table as a plain list with no checkboxes, or the list of deps should be moved after the table. Since these are
- Dependencies are now listed on the table at the top
- Projects/milestones could also be used to track complex issues that are broken down in multiple issues.
- ❌
- superseder
- ❓ Does GitHub has a way to mark an issue as duplicate?
- ✅ writing
Duplicate of #xxxxx
as a reply marks the issue as duplicate. A default "duplicate" reply can also be added to the saved replies (the icon with the left-pointing arrow on the top-right).- ❌ This doesn't work with
bpo-xxxxx
ref, so it can't be used for imported issues- ✅ we might be able to replace the
bpo-xxxxx
ref with a GH ref after the migration
- ✅ we might be able to replace the
- ❌ This doesn't work with
- The superseder is now included in the table at the top
- ✅ writing
- remote HG repos
- These are mostly outdated and haven't been migrated.
- If the link still works, these should be converted to a PR (or a patch)
- ❓ Do we need to import the link into GitHub?
- there are currently 340 valid links and 228 unique ones
- of the 228 unique ones, 88 are reachable, 125 are
404
, and 14 are unreachable - of the 88 that are reachable, 55 are hg.python.org links, 26 are GH/Gist links (so invalid HG links, but might contain a valid patch/branch), and 7 link to other repos
- of the 228 unique ones, 88 are reachable, 125 are
- I could add a "linked repos" row to the table, a simple link to the bpo issue that says "There are repos with patches linked to the original issue", or just ignore them.
- there are currently 340 valid links and 228 unique ones
- type
- There are currently 7 types on bpo: behavior, crash, compile error, resource usage, security, performance, enhancement
- There are currently 6 type-* labels on GitHub: type-bugfix, type-documentation, type-enhancement, type-performance, type-security, type-tests
so:
- type-bugfix seems to replace behavior, crash, compile error
- type-enhancement, type-performance, and type-security replace the corresponding fields
- resource usage is gone (possibly included in type-performance)
- type-tests and type-documentation are set automatically for
test_*.py
and*.rst
files (not sure if they should be types -- they were components on bpo and got added in python/bedevere#108)
- stage
- There are currently 6 stages: test needed, needs patch, patch review, commit review, backport needed, resolved
- See also this (old) proposed structure and this discussion
The stage could use the existing stage labels. An awaiting triaging might be added.
- status
-
There are currently 3 statuses: open, pending, closed
-
Events are now created for closed/reopened issues
-
Issues are labeled with the stale label when pending
- components
- There are currently 27 components: 2to3 (2.x to 3.x conversion tool), Argument Clinic, asyncio, Build, C API, Cross-Build, ctypes, Demos and Tools, Distutils, Documentation, email, Extension Modules, FreeBSD, IDLE, Installation, Interpreter Core, IO, Library (Lib), macOS, Regular Expressions, SSL, Subinterpreters, Tests, Tkinter, Unicode, Windows, XML
- ❓ People can be automatically added to the nosy list when a component is selected, can we automatically do the same with labels?
- ✔️ now we can, see #16
- versions
- There are currently 5 versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6
- Versions need to be added/removed as new versions of Python are released/retired.
- ❓ Do we want to keep versions?
- resolution
- There are currently 11 resolutions: duplicate, fixed, not a bug, later, out of date, postponed, rejected, remind, wont fix, works for me, third party
- ❓ Do we want to keep resolutions?
- priority
- There are currently 6 priorities: release blocker, deferred blocker, critical, high, normal, low
- We might be able to get rid of this field and use milestones for release/deferred blocker.
- ❓ Can we automatically warn release managers somehow?
- ✅ if we keep the release/deferred blocker labels we could set autonosy for the RMs (see #16)
- ✅ we could use milestones/projects to track release/deferred blockers for each release and the RMs can use/follow those more easily.
- keywords
- There are currently 17 keywords: 3.2regression, 3.3regression, 3.4regression, 3.5regression, 3.6regression, 3.7regression, 3.8regression, 3.9regression, buildbot, easy, easy (C), gsoc, needs review, newcomer friendly, patch, pep3121, security_issue
- ❓ Do we want to keep any of these?
An additional consideration: metadata on bpo supports searching with powerful filtering and ordering, also with the ability to save searches. Github’s search is poorer here, browsing and searching issues after the migration will be less pleasant 😦 Maybe all types, components and keywords will need to be converted to labels, which will make a huge flat list. Maybe a custom page (javascript app) will be needed to offer a better search experience.
Here are some stats on bpo fields usage, that might help decide which ones to keep. The total for each table might add up to more than 100% if issues have more than one label (e.g. multiple components or versions).
Issues (open/all): 7262/57119
type
bpo field | open | all |
---|---|---|
behavior | 2807 (38.7%) | 17747 (31.1%) |
enhancement | 2472 (34.0%) | 11468 (20.1%) |
crash | 184 ( 2.5%) | 2210 ( 3.9%) |
compile error | 161 ( 2.2%) | 1381 ( 2.4%) |
performance | 156 ( 2.1%) | 1182 ( 2.1%) |
resource usage | 78 ( 1.1%) | 890 ( 1.6%) |
security | 65 ( 0.9%) | 464 ( 0.8%) |
Total | 5923 (81.6%) | 35342 (61.9%) |
stage
bpo field | open | all |
---|---|---|
patch review | 2099 (28.9%) | 2884 ( 5.0%) |
needs patch | 886 (12.2%) | 1623 ( 2.8%) |
test needed | 297 ( 4.1%) | 874 ( 1.5%) |
resolved | 73 ( 1.0%) | 27057 (47.4%) |
commit review | 21 ( 0.3%) | 288 ( 0.5%) |
backport needed | 1 ( 0.0%) | 2 ( 0.0%) |
Total | 3377 (46.5%) | 32728 (57.3%) |
components
bpo field | open | all |
---|---|---|
Library (Lib) | 2738 (37.7%) | 16043 (28.1%) |
Documentation | 1054 (14.5%) | 8726 (15.3%) |
Interpreter Core | 630 ( 8.7%) | 7853 (13.7%) |
Windows | 479 ( 6.6%) | 3162 ( 5.5%) |
Extension Modules | 360 ( 5.0%) | 3176 ( 5.6%) |
Tests | 350 ( 4.8%) | 3483 ( 6.1%) |
asyncio | 279 ( 3.8%) | 970 ( 1.7%) |
IDLE | 274 ( 3.8%) | 1479 ( 2.6%) |
Build | 271 ( 3.7%) | 2641 ( 4.6%) |
160 ( 2.2%) | 447 ( 0.8%) | |
IO | 140 ( 1.9%) | 644 ( 1.1%) |
macOS | 119 ( 1.6%) | 1253 ( 2.2%) |
ctypes | 117 ( 1.6%) | 477 ( 0.8%) |
C API | 105 ( 1.4%) | 274 ( 0.5%) |
Unicode | 102 ( 1.4%) | 950 ( 1.7%) |
Installation | 96 ( 1.3%) | 789 ( 1.4%) |
Tkinter | 94 ( 1.3%) | 821 ( 1.4%) |
SSL | 63 ( 0.9%) | 316 ( 0.6%) |
XML | 58 ( 0.8%) | 457 ( 0.8%) |
2to3 (2.x to 3.x conversion tool) | 57 ( 0.8%) | 342 ( 0.6%) |
Cross-Build | 54 ( 0.7%) | 161 ( 0.3%) |
Demos and Tools | 44 ( 0.6%) | 512 ( 0.9%) |
Subinterpreters | 38 ( 0.5%) | 72 ( 0.1%) |
Regular Expressions | 37 ( 0.5%) | 519 ( 0.9%) |
Argument Clinic | 36 ( 0.5%) | 123 ( 0.2%) |
FreeBSD | 9 ( 0.1%) | 33 ( 0.1%) |
Parser | 9 ( 0.1%) | 35 ( 0.1%) |
Distutils | 5 ( 0.1%) | 1141 ( 2.0%) |
Total | 7778 (107.1%) | 56899 (99.6%) |
versions
bpo field | open | all |
---|---|---|
Python 3.8 | 2046 (28.2%) | 6851 (12.0%) |
Python 3.9 | 1845 (25.4%) | 5067 ( 8.9%) |
Python 3.7 | 1706 (23.5%) | 7442 (13.0%) |
Python 3.10 | 1452 (20.0%) | 3508 ( 6.1%) |
Python 3.6 | 1390 (19.1%) | 7054 (12.3%) |
Python 3.11 | 541 ( 7.4%) | 1203 ( 2.1%) |
Total | 8980 (123.7%) | 31125 (54.5%) |
resolution
bpo field | open | all |
---|---|---|
fixed | 21 ( 0.3%) | 24291 (42.5%) |
not a bug | 11 ( 0.2%) | 6178 (10.8%) |
duplicate | 7 ( 0.1%) | 3720 ( 6.5%) |
wont fix | 7 ( 0.1%) | 2295 ( 4.0%) |
third party | 7 ( 0.1%) | 701 ( 1.2%) |
remind | 5 ( 0.1%) | 18 ( 0.0%) |
out of date | 4 ( 0.1%) | 3145 ( 5.5%) |
postponed | 4 ( 0.1%) | 114 ( 0.2%) |
works for me | 4 ( 0.1%) | 952 ( 1.7%) |
later | 3 ( 0.0%) | 154 ( 0.3%) |
rejected | 3 ( 0.0%) | 2801 ( 4.9%) |
Total | 76 ( 1.0%) | 44369 (77.7%) |
priority
bpo field | open | all |
---|---|---|
normal | 6951 (95.7%) | 51387 (90.0%) |
low | 229 ( 3.2%) | 2483 ( 4.3%) |
high | 55 ( 0.8%) | 1583 ( 2.8%) |
critical | 10 ( 0.1%) | 449 ( 0.8%) |
release blocker | 2 ( 0.0%) | 933 ( 1.6%) |
deferred blocker | 1 ( 0.0%) | 107 ( 0.2%) |
Total | 7248 (99.8%) | 56942 (99.7%) |
keywords
bpo field | open | all |
---|---|---|
patch | 2878 (39.6%) | 25886 (45.3%) |
easy | 202 ( 2.8%) | 2139 ( 3.7%) |
needs review | 84 ( 1.2%) | 928 ( 1.6%) |
newcomer friendly | 17 ( 0.2%) | 98 ( 0.2%) |
easy (C) | 11 ( 0.2%) | 75 ( 0.1%) |
3.5regression | 10 ( 0.1%) | 60 ( 0.1%) |
pep3121 | 8 ( 0.1%) | 57 ( 0.1%) |
buildbot | 7 ( 0.1%) | 328 ( 0.6%) |
3.6regression | 7 ( 0.1%) | 47 ( 0.1%) |
3.8regression | 6 ( 0.1%) | 55 ( 0.1%) |
3.3regression | 3 ( 0.0%) | 79 ( 0.1%) |
3.7regression | 3 ( 0.0%) | 60 ( 0.1%) |
3.9regression | 3 ( 0.0%) | 36 ( 0.1%) |
3.10regression | 3 ( 0.0%) | 26 ( 0.0%) |
gsoc | 2 ( 0.0%) | 19 ( 0.0%) |
3.2regression | 2 ( 0.0%) | 31 ( 0.1%) |
security_issue | 2 ( 0.0%) | 35 ( 0.1%) |
3.4regression | 2 ( 0.0%) | 43 ( 0.1%) |
Total | 3250 (44.8%) | 30002 (52.5%) |
Can we drop ‘resolution’? GitHub doesn’t have this and I’ve never missed it. I don’t think I know of any project that introduced a set of labels with this purpose. We just explain the reason for closing in the message when we close it. I don’t recall ever searching for issues with a specific resolution.
I’ve also often wondered why we have ‘stage’.
Both stage
and resolution
mostly have an informative purpose. The stage
tells what's the next thing needed to make the issue move forward (are we waiting for a fix? for a review?), whereas the resolution
tells the reason why the issue was closed (e.g. was it fixed? rejected?).
I agree that the resolution
can be dropped.
For the stage
the situation is a bit more complicated, because on bpo we only had issues, whereas here we also have PRs. In addition, we already have a set of labels for PR stages that are added by @bedevere-bot automatically.
The current sequence on bpo is roughly
no selection -> test needed -> needs patch -> patch review -> commit review -> backport needed -> resolved
If an issue has no selection
, it's usually because it's not triaged yet or because people are still figuring out whether it needs a fix or not. If instead a test to reproduce the issue is needed, we are in the test needed
stage. I'm considering adding an untriaged
/new
label automatically on new issues, that can be removed as soon as someone comes around to triage them and the issue is being discussed (this also leaves the issue less label-cluttered).
Then, if the issue has no PR linked to it, we can assume we are either still discussing, or in the patch needed
stage. If the PR is not a WIP or if a review has been requested, then we are in the patch review
or commit-review
stage. The backport needed
is handled by a bot and there is already a set of labels for each version. Once a PR/issue is merged/closed, the PR/issue is implicitly resolved
. All this is already visible through the GitHub UI, without the need for labels.
To summarize:
- The
resolution
s can be dropped (can be mentioned in the closing message) - The
stage
s can be dropped, since they can be inferred by other elements in the UI (linked PRs, requested reviews, backport labels, closed/merged issue/PR) AnUsers can't add labels, so if an issue has no labels it's untriaged/newuntriaged
/new
label can be added for new issues, and removed once they are triaged- Maybe @brettcannon can comment on the intended use and actual usefulness of the existing stage labels added by bedevere now that they have been around for a while.
This is a proposed mapping.
type
type-bug
: "An unexpected behavior, bug, or error"type-feature
: "A feature request or enhancement"type-security
: "A security issue"type-crash
: "A hard crash of the interpreter, possibly with a core dump"
In addition:
- It was suggested to expand
type-compile-error
to include all build errors (e.g. configure/Makefile issues). Since we already have abuild
label, thetype-compile-error
has been removed. - Similarly,
performance
andresource usage
have been replaced by aperformance
label that can be combined either withtype-bug
ortype-feature
bug
,crash
,compile error
could be merged undertype-behavior
(users often have trouble telling them apart).- ❓ Should we merge them or keep them separate?
- ✔️
type-crash
has been kept,compile error
s can be indicated withtype-bug
+build
- ✔️
- ❓ Should
crash
became a standalone label instead of atype-*
label?
- ❓ Should we merge them or keep them separate?
- We might want to get rid of
type-security
if security issues should be reported under the Security tab of the repo. - I'm not sure if we can detect this when users select the issue type from the template, or when they add the label before they submit, but it could either be written in the template or be handled by an action after the report.
- ✔️
type-bugfix
has been renamed totype-bug
.- ❓ do we need this classification for PRs when the issue is already classified?
- ✔️
type-documentation
andtype-tests
have been renamed todocs
andtests
stage
- We can remove stages
- ❓ We currently have
awaiting change review
,awaiting changes
,awaiting core review
,awaiting merge
,awaiting review
onpython/cpython
andtest needed
,needs patch
,patch review
,commit review
,backport needed
,resolved
- ❓ Should we map
patch review
andcommit review
toawaiting review
?
- ❓ Should we map
components
Labels in this group are related to the location of the affected files:
library
: "Python modules in the Lib dir"documentation
: "Documentation in the Doc dir"interpreter-core
: "Interpreter core (Objects, Python, Grammar, and Parser dirs)"extension-modules
: "C modules in the Modules dir"tests
: "Tests in the Lib/test dir"
They could have their own namespace prefix (not sure what to use though, and the names are already long enough), or just a specific color.
expertise (was included in components before)
expert-asyncio
: this is already on python/cpython- Could be grouped with
expert-*
or just by color - ❓ What other components do we want to keep? (e.g. email, IDLE, IO, Unicode, etc.)
- asyncio->
expert-asyncio
- IDLE->
expert-IDLE
- Build->
build
- email->
expert-email
- IO->
expert-IO
- ctypes->
expert-ctypes
- C API->
expert-C-API
- Unicode->
expert-unicode
- Installation->
expert-installation
- Tkinter->
expert-tkinter
- SSL->
expert-SSL
- XML->
expert-XML
- 2to3 (2.x to 3.x conversion tool)->
expert-2to3
- Subinterpreters->
expert-subinterpreters
- Regular Expressions->
expert-regex
- Argument Clinic->
expert-argument-clinic
- asyncio->
- FreeBSD and Demos and Tools have no corresponding labels, Cross-build and Build have been merged into build, Distutils has been included into library, Parser into interpreter-core.
OS (was included in components before)
OS-windows
andOS-mac
: these are already on python/cpython- We could add
OS-FreeBSD
and possibly others - ❓ Any other OS that deserves a label?
- ✔️ no
versions
- We already have
needs backport to *
on python/cpython - There is a discussion on Discourse about this
- In the same thread, it was suggested to just have labels to indicate if it only applies to
main
, if it should be backported to maintenance releases, and also to security releases- This could be inferred by the issue type (feature, bug, security) and marked with the
needs backport to *
labels
- This could be inferred by the issue type (feature, bug, security) and marked with the
- ❓ Should we remove versions, only keep two, or keep them all?
- ✔️ all active versions (
3.7
-3.11
) have been kept. They can be converted to milestones after the migartion.
- ✔️ all active versions (
resolution
- I only kept
invalid
(since it was already on python/cpython). There is also aspam
label.
priority
- ❓ Are the RMs fine with using milestones/projects to track release/deferred, or do they prefer to have labels?
- ✔️ they are fine, but for now the
release blocker
anddeferred blocker
labels have been added. This will make it easier to identify issues and add them to milestone/projects.
- ✔️ they are fine, but for now the
keywords
- I only kept
easy
. The others are barely used. - ❓ Is there any other keyword that we should keep?
- ✔️ no(?)
The intended use of the stage labels was to always have a rough idea as to why an issue is still open without having to read the entire issue to figure it out.
In case of priorities, I assume we only really need release blocker
and deferred blocker
.