whatwg/html

Make underscore-prefixed attributes formally valid (works AS IS in all existing browsers)

Marat-Tanalin opened this issue · 13 comments

It makes sense to make underscore-prefixed (_) custom HTML attributes formally valid:

<div _foo="bar" _lorem="ipsum" _boolean></div>

Facts and advantages

  • Works in all existing browsers, no implementor efforts needed, it’s solely about formal validity.
  • 100% issue-free and future-proof: standard attribute names will certainly never start with underscore.
  • One-character long (5 times shorter than data-).
  • Generic prefix with no specific constrained meaning unlike data- (custom attributes are for data to the same extent as all standard attributes are) or with any other letter-based prefix like my-.
  • Provides a way to use unlimited number of prefixes needed (and only if needed) for specific web app.
  • There was some acceptance (namely by Lea Verou, Alexander Farkas, Taylor Hunt aka tigt, and Martin Janecke aka prlbr) of this idea in a related thread at the WICG forum.
  • As a bonus, hyphen-free attribute names are easily selectable entirely via double click in code editors.

The data- prefix is omitted/ignored in practice

The currently allowed data- prefix is in fact too long (5-times longer than _) and therefore just omitted/ignored in practice (e. g. the Angular JS library uses ng-* attributes), so HTML markup inevitably gets invalid anyway (and not future-proof compared with the underscore prefix).

No interrelation with data- and dataset

Underscore-prefixed attributes should not affect data- prefixed attributes and/or the dataset DOM property in any way, and vice versa. _ and data- prefixed attributes should just coexist.

Hyphen prefix as an option

Alternatively, hyphen (-) prefix could be used instead of underscore (_), but actually, both work in all browsers, both are future-proof, so both should be formally valid. Underscore would just be enough.

Also, from the two, the underscore prefix is reportedly the only XML-compatible option that is also confirmed by Simon Pieters (@zcorpan).

Working in browsers is not related to document conformance, you're right. This is indeed entirely about document conformance. But I don't think there's benefit in splitting the ecosystem into two prefixes (_ and data-) just because some people don't like typing the extra four characters. If you care about conformance you're probably willing to invest that extra effort. If you don't (like Angular etc.), you probably won't use any prefix at all.

@domenic One of the main points of HTML5 compared with XHTML was to be based on speccing existing implementations and removing artificial limitations that don’t correspond to reality. The invalidity of underscore-prefixed attributes is an artificial limitation.

To be clear, that’s not about trying to change the approach already used in existing libraries like Angular, but about preventing newer web-based products from being forced to ignore validity over and over again.

Fwiw, I did always care about validity, but I do currently use underscore-prefixed attributes when adding them dynamically via JS (so they are just not discoverable by validator). Making underscore-prefixed attributes formally valid would just allow to spread their use to static HTML by removing the artificial limitation not corresponding to reality.

I believe most of web developers aware of the change would immediately and completely drop using the data- prefix in favor of the _ prefix once the latter is legitimized, so there would be no splitting, there would be just the _ prefix, no one needs the long data- prefix once a 5-times shorter option is available.

One of the main points of HTML5 compared with XHTML was to be based on speccing existing implementations and removing artificial limitations that don’t correspond to reality. The invalidity of underscore-prefixed attributes is an artificial limitation.

I think you're confusing the implementation-facing aspects of HTML with the document conformance requirements. From one perspective, every document conformance requirement is an artificial limitation. But these limitations are in place for good reason! In addition to the reason I mentioned above for this specific case (not splitting the ecosystem into two approaches for custom data attributes), see also https://html.spec.whatwg.org/#conformance-requirements-for-authors for a more general set of reasons why we impose these restrictions. They may be artificial, but they're not arbitrary, and they definitely correspond to reality!

@domenic Redirecting to a large document is typically not too helpful. Quoting a relevant minimal part right here could be.

Fwiw, note that not all custom attributes are data attributes. For example, I often use boolean custom attributes (with no values). These are different paragidms.

I'm sorry, if you're not able to read the spec you're filing bugs on, especially the small section you're discussing changing, we're not going to have a very productive conversation :(

@domenic Please be more specific if you really intend to be helpful and not just to practice eloquence. Even if I read the entire spec, it may still be unclear what exact part of it you mean; one does not exclude the other. If needed, feel free to contact me privately to prevent polluting the thread. Thanks.

They may be artificial, but they're not arbitrary, and they definitely correspond to reality!

Reality is that underscore-prefixed attributes work in all existing browsers, but the spec does not account for this. And while using absolutely arbitrary names of custom attributes would be unsafe due to potential conflicts with same-name attributes standardized in the future, the underscore prefix will certainly never be used for standard attributes, so it is absolutely safe.

For example, I often use boolean custom attributes (with no values).

That is perfectly valid use of data- attributes. Would it help to include an example in the spec?

Is there a reason you don't use the dataset API when adding attributes via JS? It is less to type than using underscore prefix with setAttribute:

elm.setAttribute('_foo', 'bar');
elm.dataset.foo = 'bar';

I understand that the API doesn't help for markup or in selectors, though.

Reality is that underscore-prefixed attributes work in all existing browsers, but the spec does not account for this.

It accounts for it in that it requires them to work in UAs.

And while using absolutely arbitrary names of custom attributes would be unsafe due to potential conflicts with same-name attributes standardized in the future, the underscore prefix will certainly never be used for standard attributes, so it is absolutely safe.

Any prefix will work in browsers and most of them will never conflict with future additions to HTML... In the WICG thread it has already been suggested to allow other prefixes. Do you think people will be happy with the underscore, or still use some other prefix because underscore is ugly (or whatever)?

I think this is a tradeoff between

  • splitting the ecosystem into two (or more) prefixes
  • confusing developers who assume dataset will work with the new prefix
  • slippery slope to allowing even more prefixes

and

  • less typing (in markup and selectors cases)

@Marat-Tanalin, my browser had great trouble loading the full spec to view that small section, so here is a link to that section from the multipage document: (https://html.spec.whatwg.org/multipage/introduction.html#conformance-requirements-for-authors).

Do you think people will be happy with the underscore, or still use some other prefix because underscore is ugly (or whatever)?

@zcorpan, that is a good point. Developers who wish to be compliant and also maintain readable source may be happy, as evidenced by some of the prominent individuals supporting the wicg thread. Developers find the extra word/characters distracting to the flow of their documentation. Google and their Angular library may be one example. A similar example may be the removal of xlink in <use xlink:href>.

I don't think there's benefit in splitting the ecosystem into two prefixes (_ and data-) just because some people don't like typing the extra four characters.

@domenic, could _ and data be synonymous? The data prefix was chosen because the attribute is intended to store custom data private to the page or application. The underscore (_) as a synonym for data might be appropriate, as it is a common pattern for private data.

@jonathantneal they could be, but it would complicate the processing model and not over much new utility.

What I think we want to do with attributes is enable hooks for them just as we did with custom elements. And basically allow any attribute with a hyphen (except for the couple that were already minted) to be used as custom global attribute, with the appropriate JavaScript binding. That would offer something to developers (just like data- did with its fairly basic API) and might entice them to stick to the suggested naming scheme.

@zcorpan Hello, Simon. Thanks for the substantive comment.

Is there a reason you don't use the dataset API when adding attributes via JS?
I understand that the API doesn't help for markup or in selectors, though.

Right, I use custom attributes in selectors. And I would also like to use them in markup immediately once they are legitimized. Also, I find the dataset magic somewhat confusing in terms of mapping it to corresponding attributes and prefer to deal with attributes directly.

It accounts for it in that it requires them to work in UAs.

Right, but it is obviously not what I mean. _-prefixed attributes de facto work in all browsers, such attributes will not be used for standard attributes, but they are formally invalid for no practical reason; this should be fixed.

In the WICG thread it has already been suggested to allow other prefixes.

I believe such other prefixes are proposed not because they are actually good or usable, but just as a suboptimal partial solution of the “better than nothing” category.

Actually, I would continue to use underscore as long as other formally valid options are longer than one character, so introducing such options wouldn’t help much.

Fwiw, the big advantage of _ prefix for me is that it is generic by its nature and somewhat not even a prefix and instead just a short and simple way to clearly separate custom attributes from standard ones. Same applies to the hyphen (-) prefix, I just tend to like the underscore one more and, as a bonus, the underscore is also reportedly the only XML-compatible option.

@jonathantneal Hello, Jonathan.

could _ and data be synonymous?

While such synonymization might sound good at first glance, this would probably lead to confusion and complexity in terms of e. g. priority when both _ and data- (e.g. _foo and data-foo) attributes are set. That’s why I intentionally explicitly pointed out that they should not affect each other in any way.

There was some acceptance (namely by Lea Verou, Alexander Farkas, Taylor Hunt aka tigt, and Martin Janecke aka prlbr) of this idea in a related thread at the WICG forum.

Just to note that "support" here just means I hearted the post. I don't actually like this idea, though I do think it's better than data-. It's not paving the cowpaths. I posted a proposal that does in #2271

Let's continue this discussion in #2271; the two proposals are for the same issue.