Map bpo issue metadata to GitHub fields/labels

Question

Map bpo issue metadata to GitHub fields/labels

Closed this issue 2 years ago · 7 comments

Answer 1 · 2021-06-21T02:39:34.000Z

An additional consideration: metadata on bpo supports searching with powerful filtering and ordering, also with the ability to save searches. Github’s search is poorer here, browsing and searching issues after the migration will be less pleasant 😦 Maybe all types, components and keywords will need to be converted to labels, which will make a huge flat list. Maybe a custom page (javascript app) will be needed to offer a better search experience.

Answer 2 · 2021-10-06T08:00:43.000Z

Here are some stats on bpo fields usage, that might help decide which ones to keep. The total for each table might add up to more than 100% if issues have more than one label (e.g. multiple components or versions).

Issues (open/all): 7262/57119

type

bpo field	open	all
behavior	2807 (38.7%)	17747 (31.1%)
enhancement	2472 (34.0%)	11468 (20.1%)
crash	184 ( 2.5%)	2210 ( 3.9%)
compile error	161 ( 2.2%)	1381 ( 2.4%)
performance	156 ( 2.1%)	1182 ( 2.1%)
resource usage	78 ( 1.1%)	890 ( 1.6%)
security	65 ( 0.9%)	464 ( 0.8%)
Total	5923 (81.6%)	35342 (61.9%)

stage

bpo field	open	all
patch review	2099 (28.9%)	2884 ( 5.0%)
needs patch	886 (12.2%)	1623 ( 2.8%)
test needed	297 ( 4.1%)	874 ( 1.5%)
resolved	73 ( 1.0%)	27057 (47.4%)
commit review	21 ( 0.3%)	288 ( 0.5%)
backport needed	1 ( 0.0%)	2 ( 0.0%)
Total	3377 (46.5%)	32728 (57.3%)

components

bpo field	open	all
Library (Lib)	2738 (37.7%)	16043 (28.1%)
Documentation	1054 (14.5%)	8726 (15.3%)
Interpreter Core	630 ( 8.7%)	7853 (13.7%)
Windows	479 ( 6.6%)	3162 ( 5.5%)
Extension Modules	360 ( 5.0%)	3176 ( 5.6%)
Tests	350 ( 4.8%)	3483 ( 6.1%)
asyncio	279 ( 3.8%)	970 ( 1.7%)
IDLE	274 ( 3.8%)	1479 ( 2.6%)
Build	271 ( 3.7%)	2641 ( 4.6%)
email	160 ( 2.2%)	447 ( 0.8%)
IO	140 ( 1.9%)	644 ( 1.1%)
macOS	119 ( 1.6%)	1253 ( 2.2%)
ctypes	117 ( 1.6%)	477 ( 0.8%)
C API	105 ( 1.4%)	274 ( 0.5%)
Unicode	102 ( 1.4%)	950 ( 1.7%)
Installation	96 ( 1.3%)	789 ( 1.4%)
Tkinter	94 ( 1.3%)	821 ( 1.4%)
SSL	63 ( 0.9%)	316 ( 0.6%)
XML	58 ( 0.8%)	457 ( 0.8%)
2to3 (2.x to 3.x conversion tool)	57 ( 0.8%)	342 ( 0.6%)
Cross-Build	54 ( 0.7%)	161 ( 0.3%)
Demos and Tools	44 ( 0.6%)	512 ( 0.9%)
Subinterpreters	38 ( 0.5%)	72 ( 0.1%)
Regular Expressions	37 ( 0.5%)	519 ( 0.9%)
Argument Clinic	36 ( 0.5%)	123 ( 0.2%)
FreeBSD	9 ( 0.1%)	33 ( 0.1%)
Parser	9 ( 0.1%)	35 ( 0.1%)
Distutils	5 ( 0.1%)	1141 ( 2.0%)
Total	7778 (107.1%)	56899 (99.6%)

versions

bpo field	open	all
Python 3.8	2046 (28.2%)	6851 (12.0%)
Python 3.9	1845 (25.4%)	5067 ( 8.9%)
Python 3.7	1706 (23.5%)	7442 (13.0%)
Python 3.10	1452 (20.0%)	3508 ( 6.1%)
Python 3.6	1390 (19.1%)	7054 (12.3%)
Python 3.11	541 ( 7.4%)	1203 ( 2.1%)
Total	8980 (123.7%)	31125 (54.5%)

resolution

bpo field	open	all
fixed	21 ( 0.3%)	24291 (42.5%)
not a bug	11 ( 0.2%)	6178 (10.8%)
duplicate	7 ( 0.1%)	3720 ( 6.5%)
wont fix	7 ( 0.1%)	2295 ( 4.0%)
third party	7 ( 0.1%)	701 ( 1.2%)
remind	5 ( 0.1%)	18 ( 0.0%)
out of date	4 ( 0.1%)	3145 ( 5.5%)
postponed	4 ( 0.1%)	114 ( 0.2%)
works for me	4 ( 0.1%)	952 ( 1.7%)
later	3 ( 0.0%)	154 ( 0.3%)
rejected	3 ( 0.0%)	2801 ( 4.9%)
Total	76 ( 1.0%)	44369 (77.7%)

priority

bpo field	open	all
normal	6951 (95.7%)	51387 (90.0%)
low	229 ( 3.2%)	2483 ( 4.3%)
high	55 ( 0.8%)	1583 ( 2.8%)
critical	10 ( 0.1%)	449 ( 0.8%)
release blocker	2 ( 0.0%)	933 ( 1.6%)
deferred blocker	1 ( 0.0%)	107 ( 0.2%)
Total	7248 (99.8%)	56942 (99.7%)

keywords

bpo field	open	all
patch	2878 (39.6%)	25886 (45.3%)
easy	202 ( 2.8%)	2139 ( 3.7%)
needs review	84 ( 1.2%)	928 ( 1.6%)
newcomer friendly	17 ( 0.2%)	98 ( 0.2%)
easy (C)	11 ( 0.2%)	75 ( 0.1%)
3.5regression	10 ( 0.1%)	60 ( 0.1%)
pep3121	8 ( 0.1%)	57 ( 0.1%)
buildbot	7 ( 0.1%)	328 ( 0.6%)
3.6regression	7 ( 0.1%)	47 ( 0.1%)
3.8regression	6 ( 0.1%)	55 ( 0.1%)
3.3regression	3 ( 0.0%)	79 ( 0.1%)
3.7regression	3 ( 0.0%)	60 ( 0.1%)
3.9regression	3 ( 0.0%)	36 ( 0.1%)
3.10regression	3 ( 0.0%)	26 ( 0.0%)
gsoc	2 ( 0.0%)	19 ( 0.0%)
3.2regression	2 ( 0.0%)	31 ( 0.1%)
security_issue	2 ( 0.0%)	35 ( 0.1%)
3.4regression	2 ( 0.0%)	43 ( 0.1%)
Total	3250 (44.8%)	30002 (52.5%)

Answer 3 · 2021-10-06T14:40:17.000Z

Can we drop ‘resolution’? GitHub doesn’t have this and I’ve never missed it. I don’t think I know of any project that introduced a set of labels with this purpose. We just explain the reason for closing in the message when we close it. I don’t recall ever searching for issues with a specific resolution.

I’ve also often wondered why we have ‘stage’.

Answer 4 · 2021-10-06T18:39:51.000Z

Both stage and resolution mostly have an informative purpose. The stage tells what's the next thing needed to make the issue move forward (are we waiting for a fix? for a review?), whereas the resolution tells the reason why the issue was closed (e.g. was it fixed? rejected?).

I agree that the resolution can be dropped.

For the stage the situation is a bit more complicated, because on bpo we only had issues, whereas here we also have PRs. In addition, we already have a set of labels for PR stages that are added by @bedevere-bot automatically.

The current sequence on bpo is roughly
no selection -> test needed -> needs patch -> patch review -> commit review -> backport needed -> resolved

If an issue has no selection, it's usually because it's not triaged yet or because people are still figuring out whether it needs a fix or not. If instead a test to reproduce the issue is needed, we are in the test needed stage. I'm considering adding an untriaged/new label automatically on new issues, that can be removed as soon as someone comes around to triage them and the issue is being discussed (this also leaves the issue less label-cluttered).

Then, if the issue has no PR linked to it, we can assume we are either still discussing, or in the patch needed stage. If the PR is not a WIP or if a review has been requested, then we are in the patch review or commit-review stage. The backport needed is handled by a bot and there is already a set of labels for each version. Once a PR/issue is merged/closed, the PR/issue is implicitly resolved. All this is already visible through the GitHub UI, without the need for labels.

To summarize:

The resolutions can be dropped (can be mentioned in the closing message)
The stages can be dropped, since they can be inferred by other elements in the UI (linked PRs, requested reviews, backport labels, closed/merged issue/PR)
~~An untriaged/new label can be added for new issues, and removed once they are triaged~~ Users can't add labels, so if an issue has no labels it's untriaged/new
Maybe @brettcannon can comment on the intended use and actual usefulness of the existing stage labels added by bedevere now that they have been around for a while.

Answer 5 · 2021-10-14T09:25:47.000Z

This is a proposed mapping.

type

type-bug: "An unexpected behavior, bug, or error"
type-feature: "A feature request or enhancement"
type-security: "A security issue"
type-crash: "A hard crash of the interpreter, possibly with a core dump"

In addition:

It was suggested to expand type-compile-error to include all build errors (e.g. configure/Makefile issues). Since we already have a build label, the type-compile-error has been removed.
Similarly, performance and resource usage have been replaced by a performance label that can be combined either with type-bug or type-feature
bug, crash, compile error could be merged under type-behavior (users often have trouble telling them apart).
- ❓ Should we merge them or keep them separate?
  - ✔️ type-crash has been kept, compile errors can be indicated with type-bug + build
- ❓ Should crash became a standalone label instead of a type-* label?
We might want to get rid of type-security if security issues should be reported under the Security tab of the repo.
I'm not sure if we can detect this when users select the issue type from the template, or when they add the label before they submit, but it could either be written in the template or be handled by an action after the report.
✔️ type-bugfix has been renamed to type-bug.
- ❓ do we need this classification for PRs when the issue is already classified?
✔️ type-documentation and type-tests have been renamed to docs and tests

stage

We can remove stages
❓ We currently have awaiting change review, awaiting changes, awaiting core review, awaiting merge, awaiting review on python/cpython and test needed, needs patch, patch review, commit review, backport needed, resolved
- ❓ Should we map patch review and commit review to awaiting review?

components

Labels in this group are related to the location of the affected files:

library: "Python modules in the Lib dir"
documentation: "Documentation in the Doc dir"
interpreter-core: "Interpreter core (Objects, Python, Grammar, and Parser dirs)"
extension-modules: "C modules in the Modules dir"
tests: "Tests in the Lib/test dir"

They could have their own namespace prefix (not sure what to use though, and the names are already long enough), or just a specific color.

expertise (was included in components before)

expert-asyncio: this is already on python/cpython
Could be grouped with expert-* or just by color
❓ What other components do we want to keep? (e.g. email, IDLE, IO, Unicode, etc.)
- asyncio-> expert-asyncio
- IDLE-> expert-IDLE
- Build-> build
- email-> expert-email
- IO-> expert-IO
- ctypes-> expert-ctypes
- C API-> expert-C-API
- Unicode-> expert-unicode
- Installation-> expert-installation
- Tkinter-> expert-tkinter
- SSL-> expert-SSL
- XML-> expert-XML
- 2to3 (2.x to 3.x conversion tool)-> expert-2to3
- Subinterpreters-> expert-subinterpreters
- Regular Expressions-> expert-regex
- Argument Clinic-> expert-argument-clinic
FreeBSD and Demos and Tools have no corresponding labels, Cross-build and Build have been merged into build, Distutils has been included into library, Parser into interpreter-core.

OS (was included in components before)

OS-windows and OS-mac: these are already on python/cpython
We could add OS-FreeBSD and possibly others
❓ Any other OS that deserves a label?
- ✔️ no

versions

We already have needs backport to * on python/cpython
There is a discussion on Discourse about this
In the same thread, it was suggested to just have labels to indicate if it only applies to main, if it should be backported to maintenance releases, and also to security releases
- This could be inferred by the issue type (feature, bug, security) and marked with the needs backport to * labels
❓ Should we remove versions, only keep two, or keep them all?
- ✔️ all active versions (3.7-3.11) have been kept. They can be converted to milestones after the migartion.

resolution

I only kept invalid (since it was already on python/cpython). There is also a spam label.

priority

❓ Are the RMs fine with using milestones/projects to track release/deferred, or do they prefer to have labels?
- ✔️ they are fine, but for now the release blocker and deferred blocker labels have been added. This will make it easier to identify issues and add them to milestone/projects.

keywords

I only kept easy. The others are barely used.
❓ Is there any other keyword that we should keep?
- ✔️ no(?)

Answer 6 · 2021-10-18T20:41:30.000Z

The intended use of the stage labels was to always have a rough idea as to why an issue is still open without having to read the entire issue to figure it out.

Answer 7 · 2022-02-18T18:55:58.000Z

In case of priorities, I assume we only really need release blocker and deferred blocker.