whatwg/url

IdnaTestV2.json "xn--xn--a--gua.pt" test case problem

domenic opened this issue · 5 comments

What is the issue with the URL Standard?

When updating jsdom/tr46 to the latest revision of TR46, I implemented this new line of the label validity criteria:

If not CheckHyphens, the label must not begin with “xn--”.

This causes my test suite to fail due to these lines from IdnaTestV2.json:

  {
    "comment": "V2 (ignored)",
    "input": "xn--xn--a--gua.pt",
    "output": "xn--xn--a--gua.pt"
  },

which correspond to these lines from IdnaTestV2.txt:

xn--xn--a--gua.pt; xn--a-ä.pt; [V2]; xn--xn--a--gua.pt; ; ;  # xn--a-ä.pt

I can't tell whether this is a problem with the source data, or with our conversion script. The conversion script seems to be trying to do something with the V2 error codes, but I'm not sure exactly what.

Note that other V2 error codes in the test data don't seem to cause problems.

This is probably related to #760

annevk commented

This is a new criteria in UTS 46 v31 vs v29. Contrast with https://www.unicode.org/reports/tr46/tr46-29.html#Validity_Criteria.

It's not entirely clear to me if all the changes made to UTS 46 are correct. Notably we were not consulted on them.

I think this one might have been a result of our requests, or at least an interpretation of our requests. See #760. In particular I think it might align with WebKit, and thus avoid the roundtripping problems others see.

rmisev commented

I think the problem is in IdnaTestV2.txt (from Unicode 15.1). The test you mention is missing the V4 label. There should be:

xn--xn--a--gua.pt; xn--a-ä.pt; [V2, V4]; xn--xn--a--gua.pt; ; ;  # xn--a-ä.pt

This is related to: #603