python/docs-community

Add release cycle chart

hugovk opened this issue · 12 comments

In 2019 Dustin Ingram made this chart of the Python release cycle: https://python-release-cycle.glitch.me/

image

Jeff Triplett and I have forked to make one for Django and Drupal respectively. It's a simple static site: some HTML, JS and CSS using a Gantt chart from Google Charts:

Dustin recently asked:

Hugo, do you think there's any appetite for putting this in more official docs or web properties somewhere? I think there was talk about that at some point but never saw it happen.

I think it would be good, it has come up before a couple of times:

Options

If so, we have a number of options.

We could:

Proof of concept

For option two, I made a proof of concept using https://github.com/mgaitan/sphinxcontrib-mermaid to render a Gantt chart on the devguide: https://hugovk-devguide.readthedocs.io/en/mermaid/versions/

image

Right now it duplicates the dates, but once python/devguide#884 is merged, we could generate them from the CSV.

Further

Let's discuss in today's docs community meeting.

Notes from the meeting:

We thought it would be nice to have something like this on both https://python.org/downloads and https://devguide.python.org/versions. We'll start with the latter using the proof of concept (remembering the Diátaxis iterative model), wait for python/devguide#884 to be merged (now merged), then generate those CSV files and the Mermaid file from a JSON file.

I've made a start on this.

See also @encukou's python/devguide#884 (comment):

FWIW, I think the source should be a human-writable format, and we should generate a JSON for the machines to consume (like peps.python.org/api/peps.json). And we should tell people to expect the source format to change whenever we feel like it. (That'll need a Sphinx plugin, but if we want to move the formatting out of the tables, one will be needed anyway.)

@encukou Did you have a human-writable format did you have in mind? It also needs to be machine readable.

A flat, nicely-formatted JSON file should be easily human-writable? For example:

[
  {
    "cycle": "main",
    "pep": 693,
    "status": "features",
    "releaseManager": "Thomas Wouters",
    "releaseDate": "2023-10-02",
    "eol": "2028-10"
  },
  {
    "cycle": "3.11",
    "pep": 664,
    "status": "bugfix",
    "releaseManager": "Pablo Galindo Salgado",
    "releaseDate": "2022-10-24",
    "eol": "2027-10"
  },
...

Writing JSON is a pain.

Did you have a human-writable format did you have in mind? It also needs to be machine readable.

I helped add tomllib for this! (Well, acltually for Misc/stable_abi.toml – the same story). But depending on 3.11+ (or an external dependency) might not be appropriate.
My second choice is configparser/INI. See the PEP 518 table – most of INI's downsides don't apply if we don't care for third-party (non-python) tools, and if the schema is simple.

Link to pydotorg issue: python/pythondotorg#2066
For reference, original discussion about all this: endoflife-date/endoflife.date#711

Writing JSON is a pain.

If we use a format similar to the one @hugovk proposed it shouldn't be too bad: we will either have to update some dates here and there or copy/paste/update a section when a new release comes out.

If we use JSON it will also be easier for other people to consume it, if we don't we will either have to generate a JSON or they will have to add extra dependencies to handle toml/config files (or worse, parse them manually).

Also note that in python/pythondotorg#2066 also requested patch versions to be included (in the JSON).

In general, I'm strongly in favor of TOML over JSON for human-readable, and even more importantly human-writable format, and make heavy use of it as such. However, for this specific case, I believe practicality strongly outweighs purity here.

This use case is tightly constrained, to one very specific file, relatively short/simple file that we have full control of the format of, that is updated by a very small number of experienced people, in a highly constrained manner, and with updates being either tweaking one or a few existing values and (once a version/year) copy/pasting a new version block, so the practical downsides of JSON vs. TOML are minimized, whereas the former's greater simplicity and concision is maximized.

By contrast, generating JSON from TOML rather than simply using it as the source requires non-trivial complexity for users, contributors and maintainers of the devguide, a cost which seems likely to equal or outweigh any benefit for this particular case, as others have mentioned.

If we ever change the contents -- e.g. add more info that's currently in release PEPs in unstructured form -- we might want to change the input format as well. If all we have is JSON that people are depending on, we'll have painted ourselves in the corner: we won't be able to make it more complex while keeping it maintainable.
I recommend to bite the bullet and generate a file for external use now, and to strongly discourage people from using the source file.

I'm not sure I follow, sorry. If we wanted to add more info, or even a modified form of existing info, couldn't we just add a new field, which would of course be backward compatible with existing users—and would that really be any different whether or not we generated the file, in that case? And if we ever did want to change the input format, at any time couldn't we just switch to generating the public JSON from some other file, transparently to existing clients?

couldn't we just add a new field

Yes. But if you the field is a list of dicts, the file is no longer simple.

And if we ever did want to change the input format, at any time couldn't we just switch to generating the public JSON from some other file, transparently to existing clients?

Probably. Or it could be messy. How do you check?
I'm writing under the assumption that if you're already building CSV from JSON, it's trivial to build both CSV and JSON from INI. If that's not the case, let's not do it.
But if it is... I've never regretted separating human-writable files from machine-readable ones.

Yes. But if you the field is a list of dicts, the file is no longer simple.

A list of releases, each with a mapping of key-value pairs, are naturally represented by a list of objects in JSON, or as rows and columns of a CSV, whereas I personally find the equivalent TOML list of tables a little awkward to use, and INI doesn't natively have lists at all. If we instead used a mapping of versions to version data, like we do for peps.json, this would allow direct, efficient access to a desired version and would work just as well in JSON (as an object of objects) and CSV (as before), while also working with INI (as sections of key-values) and would use more common TOML tables, though it would require quoting the table names.

If you mean a key inside one of those that is itself a list of dicts (or anything else more complex than string, bool or number), then yeah that's certainly not simple. However, it isn't natively reprisentable in CSV, so it would be a significant change from the initial proposal here (where CSV is the primary internal consumption format for display as a table), nor are there any such constructs in INI (without making up our own bespoke sub-format). TOML is (IMO) easier to use for human entry in that case, but if and when we get to that point, we could always consider it then.

Probably. Or it could be messy. How do you check?

With the same formatting options and the same data, the files will be byte-identical. Otherwise, json.load(old_json_f) == json.load(new_json_f), no? Or maybe there's something else I'm missing?

I'm writing under the assumption that if you're already building CSV from JSON, it's trivial to build both CSV and JSON from INI. If that's not the case, let's not do it.

Theoretically, yeah, if you use a mapping of mappings and only strings for the values, though personally, I'd rather write JSON then INI, not the least because, as its an old ad hoc format, I'd have to remember which of the innumerable variations we were using :). For future extensions, though, INI would be basically a non-starter if we wanted to have values more complex than strings/basic data types, as you suggest above.

TOML seems to make more sense, as it is standardized and a superset of JSON; we'd just need to add tomli as a dep (or use tomllib on 3.11+), and it would indeed seem to be straightforward to read in TOML and write out JSON instead of just reading in JSON directly. I guess it just comes down to the benefit of using TOML rather than JSON directly given our current requirements right now, since it seems to me we could just as easily switch in the future as implement it now, since it would only change the input format that a small number of people (us) directly interact with.

If this can easily be changed later, there's not much point debating it. You don't have to convince me, I just hope whoever implements this takes my suggestion into account :)

That said,

INI [is] an old ad hoc format, I'd have to remember which of the innumerable variations we were using :).

That's easy -- normally you copy bits from the rest of the file, like with JSON. When you're changing the schema, you control the dialect directly.
(This might even be an advantage, as it encourages people to use the JSON export.)

Anyway, let the implementer choose the bikeshed colour here.

That's easy -- normally you copy bits from the rest of the file, like with JSON. When you're changing the schema, you control the dialect directly.

Yeah—and as you simply, we'd previously made that same argument in JSON's defense, so it would be unfair to not give INI the same benefit of the doubt on that point :)

Please see python/devguide#988 for a first draft. This does JSON -> CSV + MMD.