w3c/process

We need a process for handling registries, APIs and other 'enumerations'

Closed this issue Β· 127 comments

Currently mixed into #79, but the 'Registry' question is much more limited and we should consider attacking it independently.

The AB has a discussion page on Registries at https://www.w3.org/wiki/Registries

In general there are other items: vocabularies, accessibility mappings that could fit with registries as well.

Leaving in the AB hands for now, they can point out what the process consequences are when they work it out.

"registries" suggests the rather narrow IANA-like operation, and the requests include API etc.

An example is the Media Source Extensions Byte Stream Format Registry, which maintains a mapping between MIME-type/subtype pairs and byte stream format specifications for use with MSE.

Another example: the EPSG registry, which is the authoritative source of coordinate reference system definitions. The OGC maintains a shadow copy of this registry.

Here's yet another example: the Open Metadata Registry, which has some very well thought-out features for versioning and change control.

The stats registry in https://w3.github.io/webrtc-stats/ is one example of a spec that tries to perform the job of a registry. Somewhat doubtful - we still want the stats' behavior documented and published too.

Note also

https://www.w3.org/2011/07/regreq.html

TTML uses:

https://www.w3.org/wiki/TTML/RoleRegistry
https://www.w3.org/wiki/TTML/ItemNameRegistry

And perhaps more importantly the media type registry with short form
profile designators for TTML is at:

https://www.w3.org/TR/ttml-profile-registry/

there is an XPointer registry:
https://www.w3.org/2005/04/xpointer-schemes/
It has an ad-hoc script for adding entries:
https://www.w3.org/2005/04/xpointer-schemes/0register
We don't know if that is just supposed to deposit email in someone's mbox
and whether that person knows of their mandate.

edent commented

I'd like to mention the UK Government Registers project - https://www.registers.service.gov.uk/

It provides an authoritative - and non-revocable - list of "things".

See user stories at https://gds.blog.gov.uk/2015/09/01/registers-authoritative-lists-you-can-trust/ and https://gds.blog.gov.uk/2015/10/13/the-characteristics-of-a-register/

Looks like webauth has some, see w3c/webauthn#1177

dret commented

in case anybody is interested to read a bit about how registries are usually managed, what is managed, and how to do it, https://tools.ietf.org/html/draft-wilde-registries might be interesting to read. it's expired but still up to date, and if anybody has feedback or input for that document, i would be delighted to hear about that.

dret commented

as a second possibly useful resource, https://github.com/dret/RegMan/blob/master/W3C.md has a list of current W3C specs that ideally should use registries, but use a variety of other ways to do it because W3C doesn't currently support registries.
there are probably more specs where having a registry would have been beneficial, but people did not do it because there currently is no culture or support for doing it at the W3C.

dret commented

i have also posted this over at #79, but this here may be the more specific issue: i just updated the draft of my "the use of registries" document, and i have added a list of w3c specifications that currently are using some shape or form of a "registry", but are doing so in a rather ad-hoc session because there is no process or even guidance (afaict). here is a direct link to the section listing W3C specifications: https://tools.ietf.org/html/draft-wilde-registries-02#appendix-A

dret commented

comments here in the issue are fine

dret commented
  • "A registry is a table that documents logically independent 'atoms'; conceptually a table with independent rows" -> it also is a set of rules how to manage that table. often the table also has columns with specific meaning, such as timestamps, author info, or a "deprecated" label.
  • "Registries are purely documentational and contain no requirements." -> in most cases they do. most require the meaning of entries to be kept stable over time. most have additional requirements in terms of adding/changing/deprecating/removing entries.
  • " hosted in a way that preserves history (e.g. Git, a Wiki)" -> it's rather different to keep an edit history of a web page that represents a registry (such as a wiki), or to keep a history of the actual registry changes. it may make sense to require the latter, so that for example machinery can be built around managing changes. in this day and age, one might even go as far as requiring an API that makes both the content as well as the history machine-readable.
  • in the "The registry:" section, maybe add the API, a published/machine-readable history of the registry actions, possibly provide an outline of standard columns (value, description, timestamp, deprecation, reference to definition, ...)
  • it might make sense to provide a "menu" of registry policies that w3c groups can choose from, so that they don't have to write their own. they may be allowed to write their own, but in most cases, they's probably happily choose from a small set of standard choices.
dret commented

to be honest, i am a bit confused by the title of this issue: "We need a process for handling registries, APIs and other 'enumerations'"
i completely get the need for registries. how do APIs factor into this, and how are they related or something similar to registries?

We have a few APIs at the W3C that are 'mapping' APIs, and every time there is a new feature in the spec. they are mapping, a new matching 'API' has to be inserted in the API registry, as I understand. But I'd like someone to confirm (I don't manage such an API myself).

  • "A registry is a table that documents logically independent 'atoms'; conceptually a table with independent rows" -> it also is a set of rules how to manage that table. often the table also has columns with specific meaning, such as timestamps, author info, or a "deprecated" label.

Thank you, added "** the rules for values (logically, the column values) in each entry (e.g. uniqueness, matching to a value from some other specification or registry, etc.)"

  • "Registries are purely documentational and contain no requirements." -> in most cases they do. most require the meaning of entries to be kept stable over time. most have additional requirements in terms of adding/changing/deprecating/removing entries.

Right, and we're saying that those rules are in the document, not (solely) in the registry, so that the rules get the review to the level required for the referencing document.

  • " hosted in a way that preserves history (e.g. Git, a Wiki)" -> it's rather different to keep an edit history of a web page that represents a registry (such as a wiki), or to keep a history of the actual registry changes. it may make sense to require the latter, so that for example machinery can be built around managing changes. in this day and age, one might even go as far as requiring an API that makes both the content as well as the history machine-readable.

Most Wikis I know keep version history, and obviously Git does. Do we need to establish that a record is kept directly of requests, as well?

  • in the "The registry:" section, maybe add the API, a published/machine-readable history of the registry actions, possibly provide an outline of standard columns (value, description, timestamp, deprecation, reference to definition, ...)

add what API?
added "* managed such that registration history (requests and actions) are archived (e.g. a W3C mailing list archive, pull request history, etc.)"

  • it might make sense to provide a "menu" of registry policies that w3c groups can choose from, so that they don't have to write their own. they may be allowed to write their own, but in most cases, they's probably happily choose from a small set of standard choices.

yes, maybe, but at this level I am trying to tease out what the rules and recommendations are.

dret commented

Straw man proposal:

  • Registries are developed through the normal standards track, and are published on /TR just like other technical reports.
  • Registries are defined as part of a specification that defines an extensible/updatable table of items. A single REC-track specification can contain have multiple of these (i.e. zero or more).
  • The section defining the registry must a) state that it is a Registry per the W3C Process, b) define the fields of its table of items, c) define the method and criteria by which changes are proposed and incorporated.
  • Changes to the registry other than adding/removing/updating entries in the registry go through the normal specification change process.
  • However, adding/removing/updating entries in the registry can be done through a lightweight process similar to how we handle editorial changes.

Hm, I think Elika's suggestions at #168 (comment) and mine at #168 (comment) are very similar...

I think this is a good starting point and agree with all but possibly the first point. I wonder about the use case of a CG owning a registry .... Or maybe it’s OK To insist that if a group wants an authoritative registry in /TR, they need to go through at least a streamlined version of Charter to ensure there is consensus to let them.

@michaelchampion Community Groups can publish Community Group Reports of whatever they want; afaict if they want to make a registry of stuff there's no reason they can't do so right now.

@dwsinger I think the main difference is that mine just inlines the registry into the /TR publication so that the questions of where and how it's hosted and what archiving and update mechanisms are involved are already solved. ;)

Update mechanics are important, and this is something that is better defined in @dwsinger's proposal although it would probably help to have some concrete working examples.

I think the only difference is minor, and we should merge when we resolve this:

  • I propose a 'W3C Registries' page which links to all registries managed under the policy, and they in turn link back to their enclosing document (Rec, CG Report, Note, whatever);

  • Elika proposes that the Registries be in /TR

A working registry that versions updates of each data element in the registry can be seen here. Notice also that each element has a status, such as "added" "published" "deprecated". This registry is already being used for large metadata sets.

I think that the broad definition of registries that seems to be being used here is going to be an impediment to development. Registration of ontologies will have different requirements from registration of datasets and from registration of data elements. I see a mixture of assumptions in the answers and it seems we are not always talking about the same thing. Can we narrow / split our field so that the discussion is more focused?

I +1'ed to a few of @fantasai's comments which I believe are similar to some of @dwsinger's.

Items going into the registry should have been discussed as part of the document's progression through the Rec Track; so by the time a document is ready to move to the final stages of the Process the adding of items to the registry should have already been discussed. Therefore a lightweight method of adding items to a registry sounds sensible.

Registries are defined as part of a specification that defines an extensible/updatable table of items.

@fantasai are you suggesting that some subsections of Recs are actually inlined data sourced from other sources which can be updated in-place? So I might look at a spec one day, and see it say some text, and then the next day it shows something different?

Inlining seems both convenient for quick reading and also inconvenient because the updatable content might not be super-obviously subject to change. If someone references a dated version of a spec containing a registry they probably don't expect its contents to change. Requiring that a link is traversed in order to find the "current" version of a registry avoids this.

On balance I'd tend to avoid inlining registry content within Recs for this reason.

Another feature of a registry that would be useful is a change notification service, analogous to (or actually?) an RSS feed for that registry, that can be subscribed to so interested parties can make their own updates, e.g. to implementations, in a timely way.

On balance I'd tend to avoid inlining registry content within Recs for this reason.

I agree with @nigelmegitt that inlining registry content could lead to some difficulties. I think registries need to be easily updatable.

dret commented
dret commented
dret commented
dret commented
dret commented
dret commented
edent commented

You may be interested in the work the UK Government is doing with Registers.
https://gov.uk/registers

If I've understood this discussion correctly, this could be a model for registries.

For example, there is a canonical register of every country the UK Government recognises - https://www.registers.service.gov.uk/registers/country - it also includes countries which no longer exist, and metadata about them. Each register is maintained by a named owner, and they commit to regularly updating them. They're also cryptographically signed so that end users can be assured that they have not been compromised.

There's more detail at https://www.registers.service.gov.uk/about/characteristics-of-a-register - and I'm happy to link anyone up to the team which looks after them.

(I'm not the GOVUK rep any more - but still maintain an interest.)

@edent wrote

Each register is maintained by a named owner, and they commit to regularly updating them. They're also cryptographically signed so that end users can be assured that they have not been compromised.

I think that captures the essence of a "registry" -- there is some specific owner, presumably chosen based on qualifications, who is accountable for keep the registry up to date and accurate, and a verification mechanism to ensure that updates are actually made by the owner.

@fantasai are you suggesting that some subsections of Recs are actually inlined data sourced from other sources which can be updated in-place?

Yes, effectively. Not in-place as in changing a dated publication, but in-place as in the "latest version" of the spec always includes the latest copy of the registry data.

Inlining seems both convenient for quick reading and also inconvenient because the updatable content might not be super-obviously subject to change.

A registry needs to be clearly labelled as such, to opt in that data to the registry-update process. As I said, β€œThe section defining the registry must a) state that it is a Registry per the W3C Process, b) define the fields of its table of items, c) define the method and criteria by which changes are proposed and incorporated.”

If someone references a dated version of a spec containing a registry they probably don't expect its contents to change. Requiring that a link is traversed in order to find the "current" version of a registry avoids this.

The dated version of a spec won't change. Only the undated one. Each change to the registry is an updated publication, just like any editorial change to the spec is an updated publication.

Another feature of a registry that would be useful is a change notification service, analogous to (or actually?) an RSS feed for that registry, that can be subscribed to so interested parties can make their own updates, e.g. to implementations, in a timely way.

I believe there is already an RSS feed for /TR documents. W3C might consider having per-spec RSS feeds as well. If there's a need for some more specialized data service, the WG can provide one however is convenient for the users and maintainers. There are plenty of websites out there serving copies of the ISO language codes and Unicode tables, for example--not every copy has to be served by ISO.

I think registries need to be easily updatable.

The group in charge of maintaining the registry should set up appropriate tooling, e.g. pulling data from a GH-hosted TSV once a day and, if there are changes, building it into a spec, and publishing it with Echidna. Or whatever. The registry can be inlined in the spec as a table, and/or served as a separate TSV, JSON, or other file data in the publication directory, same as other support materials like images and examples and indexes.

it might be worthwhile to think about the problem of traffic volume, though. if you end up having implementations that constantly pull the feed, that might create some issues for popular registries.

Scaling up hosting is an issue no matter what process we decide on, but ultimately we just need to host the official copy. W3C can faciliate serving copies of the data off of someone else's more robust server if needed (using appropriate data formats / APIs) and even offer a w3.org URL for the /TR documents to link to for high-traffic pulls, interesting data queries, and the like.

Hosting directly on /TR quickly and neatly solves the questions of what format, where to host, whether to design and build some new system for serving the data, and how to handle issues like archiving, longevity, branding, reputation, and authority. The issues of convenience and speed can be solved with mirrors.

now i understand: the registry is a TR, not just the document establishing it. just as food for thought: IETF has 2000+ registries (with of course far more values in them), with quite a number of updates happening. looking at this, maybe treating every update of a registry as something that triggers a published TR update might become relatively noisy.

We already have specs which are updated almost daily. This wouldn't be a new problem. And it would not be necessary to announce the registry updates. :) Announcements should be reserved for substantive changes to the framework of the registry, or if the WG particularly wants to announce some set of changes.

Importantly, we are already publishing registries like AAM through the spec publication process. This proposal just streamlines it so that these become practical to maintain.

I think an other important question is where does stability come from. In the case of specifications, it comes from having multiple implementations that demonstrably match the spec, with the market relying on them so much that changing is generally not practical. Because of that, we don't really need a rule saying that updates to a REC must be compatible with the previous publication of this REC.

For registries there's no such thing. In the general case, there is no expectation that each entry in a registry will necessarily have multiple implementations. There are also different kind of expectations on different registries:

  • some can be updated and changed, you just need to not be reckless about it, kind of like specs
  • some should be append only
  • some should allow appending and deprecating, but never removing or changing
  • Flexible initially, then append only (or append / deprecate) after certain maturity criteria have been met

At which level do we want to enforce that:

  1. trust the consensus of the WG and its chairs to not do anything silly
  2. require each registry to have an "updating policy" section, which has to be followed (publication is denied if it isn't followed, it's valid ground for formal objections, etc)
  3. Charters must have an update policy for each registry the WG hosts
  4. The Process dictates what can be done

I kind of prefer something along the lines of 2, so that it's enforceable while still letting us account for the diversity of needs for different registries, but that raises the question of what the rules for changing the "updating policy" are. It feels that this is something that should be hard to do, but not necessarily impossible.

dret commented

i don't quite follow. what's the "implementation of an entry"?

That would depend on the type of registry. For some things (a list of languages), speaking of implementation doesn't make sense. For others it might: a list of video codec is a list of things that can be implemented. But there probably wouldn't be an expectation that all UAs implement all video formats. The registry could be used by a capability discovering API, and so it would be expected that many entries would not be implemented by many implementation.

Regardless, the point is, for REC, something qualifies when it has 2 implementations (roughly speaking). For registries, that doesn't work.

some can be updated and changed, you just need to not be reckless
about it, kind of like specs
in terms of breaking changes? i would disagree.

Again, that would depend on the type of registry. The example given earlier by @edent of the list of countries recognized by the UK government does change and update existing entries, and that's perfectly reasonable for that registry. The same policy would not be reasonable for a list of codecs

  1. Charters must have an update policy for each registry the WG hosts
    i am not quite sure what this means.

It means that if several policies are possible for maintaining a registry, it's not the working group who decides which one they will follow, but the Advisory Committee when they create (or update) the working group.

i am not sure it would be good to allow updates of the update policy.

I am quite sure that it would be bad to allow that to happen lightly. But you can never completely ban it: in the worse case, people can start a completely separate registry with the same information and a different policy. This is a human endavor, and mistakes will be made. So when we realize that we made a mistake in the update policy of a particular registry, it would be good if we had the option to fix it. So I think having some kind of hard-but-possible to change path for updating the update policy would be probably a good thing.

Not in-place as in changing a dated publication, but in-place as in the "latest version" of the spec always includes the latest copy of the registry data.
...
The dated version of a spec won't change. Only the undated one.

@fantasai I'm struggling to understand this: isn't every "latest copy" an alias to a dated version? In your proposal, does an update to a registry automagically generate a new dated version of the Rec that references it, and then update the "latest" link to point to the updated Rec?

W3C might consider having per-spec RSS feeds as well. If there's a need for some more specialized data service, the WG can provide one however is convenient for the users and maintainers.

+1

  1. require each registry to have an "updating policy" section, which has to be followed (publication is denied if it isn't followed, it's valid ground for formal objections, etc)

@frivoal : +1 to option 2.

@edent thank you for the links, that's a really useful page. Text from that page:

each register has a named owner called a β€˜custodian’.

In the case of W3C I think it would be reasonable to assign custodianship to a group as an alternative to an individual.

I have read the comments above and tried my hardest to incorporate them into a revised Wiki text at https://www.w3.org/wiki/Repositories. It would probably help me more if people had more specific edits (or wholesale replacement text)...

dret commented
dret commented

@frivoal

At which level do we want to enforce that:
trust the consensus of the WG and its chairs to not do anything silly

Under my proposal, any changes to elements in the registry fall under this policy, because I'm equating them with the current procedures for editorial changes under the Process.

require each registry to have an "updating policy" section, which has to be followed (publication is denied if it isn't followed, it's valid ground for formal objections, etc)

Yes, in my proposal this is one of the requirements to declare a Registry.

Charters must have an update policy for each registry the WG hosts

I do not propose a requirement for this. A Charter could if it wanted to, though.

I kind of prefer something along the lines of 2, so that it's enforceable while still letting us account for the diversity of needs for different registries, but that raises the question of what the rules for changing the "updating policy" are. It feels that this is something that should be hard to do, but not necessarily impossible.

Under my proposal this would be a substantive change to the spec, just like any other substantive change.

@nigelmegitt

I'm struggling to understand this: isn't every "latest copy" an alias to a dated version? In your proposal, does an update to a registry automagically generate a new dated version of the Rec that references it, and then update the "latest" link to point to the updated Rec?

Exactly. Just like any other edit to a spec.

@fantasai

I'm struggling to understand this: isn't every "latest copy" an alias to a dated version? In your proposal, does an update to a registry automagically generate a new dated version of the Rec that references it, and then update the "latest" link to point to the updated Rec?

Exactly. Just like any other edit to a spec.

Ah I see, that seems like a non-goal or possibly even undesired for registries in the uses I've seen for it.

Rather than wanting to automagically update a Rec it seems more common for folk to want to update a different document that's referenced by the Rec. I suspect the underlying reason is usually to avoid process delays and for the group maintaining the registry to feel empowered to make quick changes. It could be that an alternative way to meet those needs would be acceptable even if it involves updating a Rec, but I don't have any evidence for that one way or the other.

However one disbenefit of your proposed approach is that external standards organisations often prefer that normative references point to dated versions of specifications. Anyone doing that would never get the updated registry entries. That might be a good or a bad thing, depending on the registry concerned, I guess.

dret commented

In "Background discussions", "Accessibility API Mappings" (AAM) is listed as a general example of a register. Though the AAM are a prime case for being registers, I think it's too specific a term to be included on that list. Suggest using "Mappings" as the general term, with the AAM as the illustrative example in the subsequent paragraph.

In the "Reference requirements" section it says that a register must be referenced by at least one W3C document. What is the process in the, albeit unlikely, event that a register is no longer referenced? The obvious thing would be to make it obsolete per the current Process perhaps?

I prefer @dwsinger's suggestion of having a registers page, as opposed to including registers on /TR. They're different beasts, and combining them on /TR seems likely to confuse.

I also agree with concerns raised by @nigelmegitt and @tzviya, that pulling a register inline into an otherwise stable Recommendation is likely to be problematic, particularly when a Recommendation is used as a legal reference.

Lastly, is there a tipping point at which a table in one specification should transition into an independent register? When that table is referenced by one or more other specifications for example?

dret commented

Lastly, is there a tipping point at which a table in one specification should transition into an independent register? When that table is referenced by one or more other specifications for example?

I think the easy answer to this is that the WG should think about the trade-offs:

  • a table in the spec. can only be updated by following the process for updating a spec.; a register can have a lighter-weight, more rapid-response, mechanism;
  • updating a spec. means getting consensus of the group that owns the spec.; a register can have lighter-weight admission/approval criteria (if desired)

So, if a table is only rarely updated and then only by WG consensus, it might not benefit from being a registry. If anyone can be allowed to request new entries, and the admission criteria are reasonably easy to meet, then a registry may be preferred.

my suggestion would be to come to an initial design of the registry model for W3C,

That's exactly what I hope I have in the Wiki, but people seem to be not commenting on or noticing it...

dret commented

Comments are welcome here, on this issue, or in new Process issues.

in other cases (and there are many examples in the existing registries out in the wild), the registry is not so much an inherent part of the spec that established it, it just happened to be established as part of the spec. in those cases, it seems that treating the registry through some inclusion process would not be a good way of taking advantage of the general idea of registries.

To be clear, I'm not arguing that registries can't be split out into their own /TR report with their own shortname and nothing but the rules around what the registry is about and the format and updating rules of each entry. Just that we should re-use the same publication and review mechanisms for things on /TR as much as possible (with some modifications to ease the updating of values in the registry), since publishing there

  • works reasonably well and is established already
  • addresses all the archiving concerns wrt stability of URIs across time and provision of historical data
  • also solves issues around finding and referencing (since there are both dated URLs and latest URLs)
  • does not overly-constrain the presentation of registries, since the spec editors can decide how they are formatted in the HTML and also post data files in however many convenient formats they want together with the publication

The one thing /TR is not good at is doing interesting queries against a large registry of values, but the types of queries one might want to do will vary by the registry so that is better handled by external services (which could be informatively linked from the /TR report) than by creating some new standardized service.

I think what distinguishes registries from standards development is that the purpose of registries is effective sharing of information/data, not consensus or other types of agreement. That said, there isn't a clear line here; in some cases there might be some level of agreement desired, but still a desire to use a registry process that's not designed for having agreement. In those intermediate cases a registry process might still be appropriate if precise definitions for eligibility can be written (e.g., "the value is defined by a specification at organizations A or B in states X or Y").

I tend to think the key tension with registries is between:

  • making registration easy enough that people (at least those aware of the registry) don't use unregistered values in the wild, and
  • ensuring the registry has the information needed by its users (such as knowing how to find the information needed about the values, or the information needed to avoid duplicate registrations).

Costs to updating a registry that don't help with the second point still increase the risk of the first, so it's important to keep unrelated costs of update low. (I think this is also closely related to what I think distinguishes a registry from standards development: the purpose is effective sharing of information/data, and not consensus or other types of agreement.)

I think it's also important in the W3C context that:

  • a registry can still be updated after the Working Group that created it has been closed
  • the W3C continues maintaining registries if the maintainer disappears, which probably requires that W3C know what registries it has

Also, for what it's worth, a few recent examples of registries being developed at W3C, from a 2017 email from the TAG to the AB, include:

and an older example is the XPointer Registry.

dret commented

I agree with many of your points. However, in response to:

that again depends on the registry. for example, if you have very
limited value spaces (0-255, for example), you absolutely must manage
them responsibly and probably with quite a bit of scrutiny.

I should clarify that what I'm trying to say here is that from the perspective of designing a process for registries, I don't think cases like that should be seen as use cases for a registries process since those cases should probably use a standards process that involves that higher level of scrutiny from the community.

dret commented

Based on https://www.w3.org/wiki/Registries#Recommendation, the discussion here, as well as the https://www.w3.org/wiki/Maintainable_Standards#Registries, @fantasai and I (mostly her) have drafted possible Process-text to implement registries, using today's process as the starting point.

The changes largely fall into two categories:

  • Defining what a registry, and a change to a registry, are. This is agnostic to the existing REC Track vs evergreen vs any other track we may eventually design.
  • minimal tweaks to the REC track to allow for registry updates without triggering transition calls or other overhead-heavy process

You can preview the document with the changes incorporated here:
https://w3c.github.io/w3process/registries/

Or in diff form here:
https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fw3c.github.io%2Fw3process%2F&doc2=https%3A%2F%2Fw3c.github.io%2Fw3process%2Fregistries

The changes are in:

  • 6.2.5 on classes of changes
  • 6.3 which defines registries
  • 6.5.1 for no-overhead revisions to a CR for registry updates
  • 6.6 for allowing PR without implementation of all entries in a registry
  • 6.8.2.3 for allowing no-overhead revisions to a REC for registry updates

This is provided to help discussion on the basis of concrete text, not as a take-it-or-leave it offer.

dret commented

@dret remarks

the text suggests that a registry only exists as an embedded table, and not as a type resource.

I am sympathetic to this point and believe the sentence

A technical report may contain one or more such registries , either alone or in addition to other normative content.

is meant to say that Recommendations may contain Registries and also that Registries may be in Technical Reports that are not themselves Recommendations.

The intent (as I understand it) of the Florian/Elika proposal is to permit Recommendations to be easily updated when the update(s) are confined to the section identified as a Registry.

Similar point to @dret's #168 (comment)

Copied across from https://lists.w3.org/Archives/Public/public-w3process/2019Jun/0027.html to bring the conversations back here:

Thanks for this, there’s one feature of this proposal which I expect to cause friction:

This change envisages that registry content is included in RECs, and enforces that updates to registries are made by updating their containing REC. That in turn means that any dated reference to the REC will become outdated by a registry change when no other change has been made.

I’ve noted previously that this pattern does not fit with common usage of registries, where the registry content is referenced by the REC, which therefore does not need to change at all to accommodate changes in value.

The obvious get-out to address this would be to make a REC that only contains a registry, and reference that normatively from another REC. Introducing that as a pattern could work; we should be aware that it imposes a much higher bar for publication of registry content than has been set until now, where registries can take the form of a WG Note, a wiki page etc. I would expect some degree of push-back against that imposition on that basis.

I do not interpret this as forcing Registries to only be in Recommendations. If this remains a point of confusion then we should be explicit that Registries may be maintained in other ways, including outside of /TR. The general qualities of registries enumerated in Registry requirements should apply wherever the registry is located.

@swickr if the Registries are permitted not to be in Recommendations then that does indeed need to be clarified.

dret commented

I attempted to document the comments made in today's call in this wiki page change which allows for registries to be data sets captured in various forms.

dret commented

@nigelmegitt

I attempted to document the comments made in today's call in this wiki page change which allows for registries to be data sets captured in various forms.

Nice. I clarified to say that they may be represented as an HTML document, CSV, etc., not that they are such a thing: the registry is the data, not the representation.

@dret

i do understand that for some cases, the "embedded and always a TR" model of a registry looks good. i think in other cases this might not be quite as true

I would be interested in examples for which having a canonical representation in an HTML document (in addition to any other convenient representations) is fundamentally incompatible with the use case.

also, it might be better to have registries only working one way and not in multiple ones, so that all registries can be found and used and updated and subscribed to in the same way.

Agreed. Which is why I proposed publishing them on /TR :)

dret commented

I have absolutely no idea how I would represent something like mp4ra.org (github: https://github.com/mp4ra/mp4ra.github.io) as a 'document' which is 'published' on /TR. It boggles the mind. There are multiple pages, built from a database of CSV files by a github build script.

What if the /TR entry was a document that linked to a data table inside a spec, SQL query, or whatever? Sort of like the Readme file at the top level of a GitHub repo?

Not advocating, just brainstorming.

dret commented

@dwsinger I may be missing something, but that site says it's managing the registration of "code points", yes? And there's a list of all the things that have been registered at http://mp4ra.org/#/atoms yes? So you would copy out those tables into a document that includes (roughly speaking) the contents of http://mp4ra.org/#/request , mark it as a W3C Registry, and publish that document on /TR. If you prefer to split it out into multiple pages, you can also do that: we can have multi-page documents on /TR. You can also include, in the publication folder, copies of the CSV files (which you link to from the document so people can find them). And presumably you'd automate the whole process so that as soon as a commit goes through in the GH repo, everything gets rebuilt and posted to /TR via Echidna, same way certain WGs automate republishing of their specs on /TR.

@dret

is that a trick question?

No, it's a serious one. You keep arguing that an HTML document is insufficient to represent a registry.

to me, one of the advantages of a process would be that registries would be uniform in a variety of ways, such as representations, where to find them, how they define/announce their management, and how developers can find histories and get update notifications. this may be hard to do in a coherent way when the link to the actual registry contents can point to all kinds of things.

Publishing in /TR answers all of these questions. They are found on /TR. The data is located in that publication, not on an external server, so that it is archived and as reliably available as w3.org itself. It defines and announces its management in that same document. Developers can find histories the same way they find the history of any other publication: through the dated version links. Update notifications via Atom/RSS feeds for specs, if they are not already available, should be simple to set up. These answers would be consistent for all W3C Registries.

Additional machinery around accessing and querying individual registry tables can be set up, but that's a question of improving tooling for convenience, and needn't be a prerequisite for establishing or using a registry: we already have all the basics covered on /TR.

dret commented

whether doing it that way is a good solution to the problem.

There are two kinds of good solutions. Good solutions in the abstract, were we to design things from first principles without regards for how difficult they are to roll out, and good solutions in practice, considering what we are likely to achieve in a reasonable amount of time. Just because we can easily do something doesn't necessarily make it good, but if something is good and doable in practice, that's a strong candidate.

Publishing in /TR answers all of these questions.

It's posibble one answer.

The claim isn't that TR with some tweaks is the only reasonable way we could do this, but that it is a way, and that it is a way we can easily roll out, given that we already have most of it, both on the rules side and on the tooling side.

They are found on /TR.
that's different from having a registry page where you simply find all
registries.

https://www.w3.org/TR/ lists everything on TR. https://www.w3.org/TR/?tag=css lists the subset of that that has anything to do with CSS. We could easily set up https://www.w3.org/TR/?tag=registry that would give you all TR entries that are or contain registries.

to find out about changes in the differences you have to do a diff?

You can:

  • look at the changes sections of the document
  • use https://services.w3.org/htmldiff
  • do a source diff
  • go find out from the document's headers where the source is maintained and look at the version history there

would the proposed update tell me about the actual change

If it's not already set up (maybe it is, but I cannot find it), it should be easy to set up an RSS feed for each technical report (spec, document, call them what you want) on TR, where each entry contains a copy of the abstract of the document and the changes section.

i don't quite grasp you utter confidence that TR is all that is ever needed.

I don't think the point is that TR is the only way we could ever solve that problem, but:

  • Used right, it does seem to solve all the use cases we've said we wanted to solve
  • If we want to enable this process soon (in the next 6 months rather than in the next 5 years), choosing a way that needs minor additions over what we already have seems better than a way that needs us to write everything from scratch.

@dwsinger

Matches one of the things I think should be possible. As long as it keeps history, is backed up, etc. see the wiki ;-)

I'm getting the sense that you would like the process to be the abstract requirements, so that any process/tooling combination that fulfills them all would be valid to use. Is that right?

I think this is misguided, as it doesn't actually solve the problem, and just passes it down to people who want to maintain a registry. Each person who wants to maintain a registry then has to come up with an actual process/tooling for doing so, check with the Team whether their particular instantiation fulfills all the requirements. (Do we need to define the process for checking that the process/tooling is acceptable by the meta process?)

We need to put in the process a particular instantiation of the principles in the wiki, not the abstract principles themselves. Not "you can use anything you want, as longs as it maintains history, and has properties foo and bar", but "use this; it maintains history, and has properties foo and bar". Otherwise we're not writing a process for registries, but a dictionary definition of registries.

(or perhaps that's what you meant too, but I was becoming unsure).


As for mp4ra, I don't see what the complexity is either. For sure it is large, but I don't see what aspect of it is in conflict with the proposed process.

  • As far as I can tell, this is just a bunch of tables.
  • The http://mp4ra.org/ website presents these tables in a number of sub-sections (http://mp4ra.org/#/atoms, http://mp4ra.org/#/brands), but specs can have subsections has well, served from different URLs if we want to (https://www.w3.org/TR/CSS22/selector.html vs https://www.w3.org/TR/CSS22/colors.html)
  • The source is maintained in csv files, with build scripts that generate the registry site from them, but so what? @fantasai's proposal does not dictate in any way what tooling you use to build your registry. We can teach bikeshed to read csv files if we want to.
  • A number of table entries cross-reference eachother with hyperlinks. So what? We can do that in HTML, and tools like bikeshed make that pretty convenient.
  • It has a search function at http://mp4ra.org/#/search. This one's a little more fuzzy, but:
    • The Process doesn't forbid us from including a similar JS-powered search function into a spec. PubRules might, but that's a question for PubRules, not for the Process.
    • Arguably, the search function isn't part of the registry itself, it's just a service provided that uses the registry as its input. If, for example https://www.w3.org/TR/uievents-code/ was a registry (and even if it isn't), nobody would prevent us from writing https://www.uievents-code.org, and from having a /search page in there if we find it convenient.

to find out about changes in the differences you have to do a diff?

In case folk aren't aware, there's a whole discipline (sometimes called "Master Data Management") that deals with managing reference data sets and their evolution over time, as well as managing the combination of different sources of the same data when it is unclear which is authoritative.

One approach that is often taken is never to delete a data point, instead marking the validity of each point with some time range.

This really helps:

  1. to answer queries like "what would the answer have been if I'd have asked on 1st June 2019?" and
  2. to highlight re-use of data points that had a different meaning historically, so that a reasoned decision can be made about whether that re-use is a good idea or not.
  3. additions can be published with a "becomes valid" date in the future to allow for planned changes to be synchronised.

This is important when we think about the proposal for publishing the registry data sets as HTML documents. We could avoid the need for using a diff tool or looking at a change set by requiring this validity data to be included on each data point, and then as a standard template, including a filtering option so that any arbitrary version of the data set can be presented without having to go through additional tools.

(I am not claiming to be an expert on master data management, my knowledge is an artefact of a previous job!)

@nigelmegitt The kind of tooling you describe seems useful, but I don't think they need to be built into the registry. They operate on the content of the registry, so as long as the content of the registry has an agreed upon automatically processable format, and that revisions in the registry are dated, that kind of tool can be built.

Registries have been described as an urgent need. I think we should be careful not to overengineer what we're doing. The core idea of a registry is quite simple. Quoting the wiki:

A registry is a data set that documents logically independent 'atoms'; conceptually a table with independent rows, and rules for the values in the columns

Now, on top of that, lots of things can be built. And maybe some of the things that can be built should be built by w3c to make it easier to work with registries. But in the end, the registry is still a (set of) table(s) with rules on what goes in there and how to update them, and version history. We need to get that part right, and the rest can be built on top.

dret commented

Now, on top of that, lots of things can be built.

@frivoal some things are very hard to add later. Especially for registries, if we think that managing the lifecycle of entities in a registry is important, then one cost of adding that later is that there will probably be a loss of data quality.

If the representation of the data set happens to be a document managed under an archived space, such as /TR, then it may be possible to compute this data later, with some hope of accuracy. If there is only API access then it would be extremely difficult.

There have been a variety of use cases for "registries" presented. Not all of these require the strict data history management and specialized atomic APIs that others do. If we insist on a system that has those requirements, then it becomes harder to use for the cases that don't need it. One of the benefits of defining it through /TR is flexibility.

I'd like to point out that Unicode has, in effect, a lot of registries about its code points detailing various properties of characters and the like. They are officially published as text files, because that is a stable and easily parseable format. Other services such as https://unicode.org/cldr/utility/ and http://www.fileformat.info/info/unicode/ wrap tooling around that, providing useful ways to look at the data. Implementations import the data files regularly. But these interfaces to the data are not the canonical publication of that data.

One of the primary purposes of publishing an official registry through W3C rather than on a private server somewhere is to have a canonical publication that's basic enough to be readable and archival and importable and reusable. Hosting data somewhere else than w3.org doesn't satisfy this. Interfaces like the mp4ra.org query system are nice, but query systems are not fundamental. We need to solve the fundamental requirement: to provide the official data in a way that is consistent and continuous. Everything else is just incremental improvement in tooling.

I'm not saying we shouldn't build improved tooling. But if we need to build an entirely new system that has the same stability and consistency guarantees as /TR as a prerequisite for solving this problem, then we're not going to get anywhere soon. And we'll either have to shoehorn anything that doesn't quite fit into that specialized system to match its inputs and outputs, or be unable to handle it as a registry.

dret commented

I'm getting the sense that you would like the process to be the abstract requirements, so that any process/tooling combination that fulfills them all would be valid to use. Is that right?

I think this is misguided, as it doesn't actually solve the problem, and just passes it down to people who want to maintain a registry. Each person who wants to maintain a registry then has to come up with an actual process/tooling for doing so, check with the Team whether their particular instantiation fulfills all the requirements. (Do we need to define the process for checking that the process/tooling is acceptable by the meta process?)

We need to put in the process a particular instantiation of the principles in the wiki, not the abstract principles themselves. Not "you can use anything you want, as longs as it maintains history, and has properties foo and bar", but "use this; it maintains history, and has properties foo and bar". Otherwise we're not writing a process for registries, but a dictionary definition of registries.

OK, so, yes, I want to agree on what the rules are, and write them in the (relatively hard to change) process.

Yes, while we're learning, I want to leave as much flexibility as we can so that we learn as much as possible. I do not wish to have over-constraining rules. Part of this is the humility that I might not have realized a valid use case or useful solution.

Yes, I want to agree on the rules that we need before we dive into solutions that satisfy those rules. I thought you were disagreeing about the rules; instead, you want to bless your preferred solution (and implicitly deprecate others').

I am completely supportive of developing one or more concrete sets of infrastructure that satisfy the rules.

I would like to make it possible that existing quasi-registries could become, with small amounts of effort, Registries as defined and prescribed by the W3C process. So, for example, a Wiki could host a registry as long as the defining document and the registry have the right material.

I think we will need some guidance documents and tools, that can be much more flexibly handled than the formally approved process. One such guidance could be "how to manage a registry AS a section in a document on /TR." (Basically, say something like "this section constitutes a Registry [[as defined in the process]] and is updated according to the update process for Registries. Updates of the this section -- the Registry -- can occur without a change of name, version, or publication date, of the document.")

dret commented

For completeness, I wanted to add this inventory of W3C related registries:

The W3C Credentials Community (https://w3c-ccg.github.io) has been maintaining several registries (at https://github.com/w3c-ccg) because we have been around for a long time over the existence of multiple WGs, we do not terminate or expire, and are quite active (weekly meetings with 20+ people and many different companies).

We are informally are using this work process for them: https://lists.w3.org/Archives/Public/public-credentials/2017Dec/0020.html but the plan is to formalize this, have our community approve it formally, and move our final registries process to here: https://github.com/w3c-ccg/registries-process

We initially inherited some cryptography related registries from the Web Payments, Verifiable Credential WG, and JSON-LD WGs as these groups were not chartered to do crypto:

We have been asked by the existing W3C Verifiable Claims Working Group chairs to maintain these Verifiable Credentials related registries, as the WG will hopefully soon be complete and they need someone to maintain it after that WG winds down.

This is the first registry for evolving Decentralized Identifier specification, which will hopefully soon be an official WG. I anticipate there will be more added, and since these need to be long-lived, the intent is that they will stay in the W3C-CCG.

-- Christopher Allen β€” co-chair W3C Credentials CG

cc: @jandrieu, @kimdhamilton, @msporny, @burnburn, @stonematt

I agree that abstract rules without any clue on how to meet them are probably unhelpful as a way to get things going. But I also feel that as we learn how to manage registries, we should set the rules such that they express only what we must have, and leave as much latitude as possible for learning, modes of working, and so on. I am particularly keen that it should be as easy as possible for groups with "proto registries" to make the changes needed to become a Registry. So if they are already inline, minor edits to the document; if they already in a Wiki, minor edits to the document and Wiki, and so on.

So I suggest we write a crisp process-like section that expresses the rules and eschews verbosity and examples; but support it with a guidelines document that we can update, clarify, use to provide examples, and so on, that helps people get going.

I did those edits in the Wiki page.

@dwsinger Random wiki systems don't have the same longevity and archival support that /TR does. If a group wants to use a wiki as the intake system and automatically copy to /TR, that's fine, but I don't think wikis are sufficient for something that has normative Recommendation-type status at W3C.

Registries don't have normative specification status. Not everything is a Rec. you know. You seem fixated on something magical about /TR? If Wikis don't meet the requirement of being archived (we know they maintain history), then we either need to fix that or not use them.

dret commented
dret commented