jgm/unicode-collation

Test suite fails when compiled against ghc 9.2

Closed this issue · 9 comments

jgm commented
Resolving dependencies...
Build profile: -w ghc-9.2.1 -O1
In order, the following will be built (use -v for more details):
 - unicode-collation-0.1.3.1 (lib) (first run)
 - unicode-collation-0.1.3.1 (test:unit) (first run)
Configuring library for unicode-collation-0.1.3.1..
Preprocessing library for unicode-collation-0.1.3.1..
Building library for unicode-collation-0.1.3.1..
[ 1 of 10] Compiling Text.Collate.Lang
[ 2 of 10] Compiling Text.Collate.Trie
[ 3 of 10] Compiling Text.Collate.UnicodeData
[ 4 of 10] Compiling Text.Collate.CanonicalCombiningClass
[ 5 of 10] Compiling Text.Collate.Normalize
[ 6 of 10] Compiling Text.Collate.Collation
[ 7 of 10] Compiling Text.Collate.TH
[ 8 of 10] Compiling Text.Collate.Tailorings
[ 9 of 10] Compiling Text.Collate.Collator
[10 of 10] Compiling Text.Collate
Configuring test suite 'unit' for unicode-collation-0.1.3.1..
Preprocessing test suite 'unit' for unicode-collation-0.1.3.1..
Building test suite 'unit' for unicode-collation-0.1.3.1..
[1 of 1] Compiling Main
Linking /Users/jgm/src/unicode-collation/dist-newstyle/build/aarch64-osx/ghc-9.2.1/unicode-collation-0.1.3.1/t/unit/build/unit/unit ...
Running 1 test suites...
Test suite unit: RUNNING...
Loading conformance test data...
Tests
  Conformance tests
    Conformance tests NonIgnorable test/uca-collation-test/CollationTest_NON_IGNORABLE_SHORT.txt: FAIL (0.68s)
      test/unit.hs:172:
      [line 19] D82F DC9E 0334 <= 0335 0334
        Calculated sort keys:
        [D82F DC9E 0334] [FBC1 D82F FBC1 DC9E | 0020 0020 004A | 0002 0002 0002]
        [0335 0334] [| 0039 004A | 0002 0002]
      [line 345] 0334 D800 DEE0 <= D800 DEE0 0334
        Calculated sort keys:
        [0334 D800 DEE0] [FBC1 D800 FBC1 DEE0 | 004A 0020 0020 | 0002 0002 0002]
        [D800 DEE0 0334] [FBC1 D800 FBC1 DEE0 | 0020 0020 004A | 0002 0002 0002]
      [line 347] 0334 D804 DF66 <= D804 DF66 0334
        Calculated sort keys:
        [0334 D804 DF66] [FBC1 D804 FBC1 DF66 | 004A 0020 0020 | 0002 0002 0002]
        [D804 DF66 0334] [FBC1 D804 FBC1 DF66 | 0020 0020 004A | 0002 0002 0002]
      [line 349] 0334 D804 DF67 <= D804 DF67 0334
        Calculated sort keys:
        [0334 D804 DF67] [FBC1 D804 FBC1 DF67 | 004A 0020 0020 | 0002 0002 0002]
        [D804 DF67 0334] [FBC1 D804 FBC1 DF67 | 0020 0020 004A | 0002 0002 0002]
      [line 351] 0334 D804 DF68 <= D804 DF68 0334
        Calculated sort keys:
        [0334 D804 DF68] [FBC1 D804 FBC1 DF68 | 004A 0020 0020 | 0002 0002 0002]
        [D804 DF68 0334] [FBC1 D804 FBC1 DF68 | 0020 0020 004A | 0002 0002 0002]
      [line 353] 0334 D804 DF69 <= D804 DF69 0334
        Calculated sort keys:
        [0334 D804 DF69] [FBC1 D804 FBC1 DF69 | 004A 0020 0020 | 0002 0002 0002]
        [D804 DF69 0334] [FBC1 D804 FBC1 DF69 | 0020 0020 004A | 0002 0002 0002]
      [line 355] 0334 D804 DF6A <= D804 DF6A 0334
        Calculated sort keys:
        [0334 D804 DF6A] [FBC1 D804 FBC1 DF6A | 004A 0020 0020 | 0002 0002 0002]
        [D804 DF6A 0334] [FBC1 D804 FBC1 DF6A | 0020 0020 004A | 0002 0002 0002]
      [line 357] 0334 D804 DF6B <= D804 DF6B 0334
        Calculated sort keys:
        [0334 D804 DF6B] [FBC1 D804 FBC1 DF6B | 004A 0020 0020 | 0002 0002 0002]
        [D804 DF6B 0334] [FBC1 D804 FBC1 DF6B | 0020 0020 004A | 0002 0002 0002]
      [line 359] 0334 D804 DF6C <= D804 DF6C 0334
        Calculated sort keys:
        [0334 D804 DF6C] [FBC1 D804 FBC1 DF6C | 004A 0020 0020 | 0002 0002 0002]
        [D804 DF6C 0334] [FBC1 D804 FBC1 DF6C | 0020 0020 004A | 0002 0002 0002]
...
  Sorting test 1:                                                                                 OK (0.04s)
  Sorting test 2:                                                                                 FAIL (0.05s)
    test/unit.hs:38:
    expected: ["ab\169","abc","abC","\55349\56502bc","\55349\56658bc","Abc","ab\231","\228bc","ab\65534c","fil\233-110","file-12","File-3","\12363","\12533","\12459","\65398","\12364","\12460"]
     but got: ["ab\169","abc","abC","Abc","ab\231","\228bc","ab\65534c","fil\233-110","file-12","File-3","\12363","\12533","\12459","\65398","\12364","\12460","\55349\56502bc","\55349\56658bc"]
    Use -p '/Sorting test 2/' to rerun this test only.
  Variable ordering test
jgm commented

Library versions with ghc 9.2:

   base-4.16.0.0 binary-0.8.9.0 bytestring-0.11.1.0 containers-0.6.5.1
    parsec-3.1.14.0 template-haskell-2.18.0.0 text-1.2.5.0
    th-lft-nstncs-0.1.18-df4ad7d0

With ghc 8.10:

  base-4.14.3.0 binary-0.8.8.0 bytestring-0.10.12.0
    containers-0.6.5.1 parsec-3.1.14.0 template-haskell-2.16.0.0
    text-1.2.4.1 th-lft-nstncs-0.1.18-a3981860

Things to check:

  • changes in base 4.14 -> 4.16
  • changes in binary 0.8.8 -> 0.8.9
  • changes in bytestring 0.10.12 -> 0.11
  • changes in text 1.2.4.1 -> 1.2.5.0
jgm commented

I tried compiling with stack (ghc 8.10) and the following extra-deps, and the test suite passed:

- binary-0.8.9.0
- bytestring-0.11.1.0
- text-1.2.5.0
- parsec-3.1.14.0
- process-1.6.13.2
- unix-2.7.2.2
- directory-1.3.7.0

So I guess it's changes in base (or ghc 9.2) that we need to be looking at.

jgm commented

Possibly relevant item from base changelog:

  • Character set metadata bumped to Unicode 13.0.0.

(Hard to see why this would matter, though. We use Unicode 13.0.0 tables here. Also, Data.Char is only really used for ord, and a few other functions in Lang parsing.)

jgm commented

Here's one example of the difference in 8.10/9.2:

8.10
      [line 19] 1BC9E 0334 <= 0335 0334
        Calculated sort keys:
        [1BC9E 0334] [| 0035 004A | 0002 0002]
        [0335 0334] [| 0039 004A | 0002 0002]
9.2
      [line 19] D82F DC9E 0334 <= 0335 0334
        Calculated sort keys:                
        [D82F DC9E 0334] [FBC1 D82F FBC1 DC9E | 0020 0020 004A | 0002 0002 0002]
        [0335 0334] [| 0039 004A | 0002 0002]

Note that D82F DC9E is a surrogate pair representing 1BC9E
see http://www.russellcottrell.com/greek/utilities/surrogatepaircalculator.htm

ALgorithm
S = ((H - 0xD800) * 0x400) + (L - 0xDC00) + 0x10000;
where H = the high surrogate and L = the low

jgm commented

Line 19 of the NON_IGNORABLE conformance test hs:

1BC9E 0334                               

With ghc 8.10 this is being treated as 0x1BC9E, 0x0334
With 9.2, as 0xD82F, 0xDC9E, 0x0334

Possibly a difference in parseConformanceTestLine?

jgm commented

Interesting fact: the test failure above is on an M1 Mac.
On an Intel Mac, the test passes.

jgm commented

with ghc 9.2.1 on M1:

% cabal repl --constrain 'text==1.2.5.0' -w ~/ghc9.2.1/bin/ghc        
...
ghci> Data.Text.pack "\113822\820"
"\55343\56478\820"

With ghc 8.10:

% cabal repl --constrain 'text==1.2.5.0'   
...
*Text.Collate> Data.Text.pack "\113822\820"
"\113822\820"

With ghc 9.2.1 on Intel:

 % cabal repl --constrain 'text==1.2.5.0'
Resolving dependencies...
Build profile: -w ghc-9.2.1 -O1
...
ghci> Data.Text.pack "\113822\820"
"\113822\820"
jgm commented

@Bodigrim Any ideas what could be going on here? Is this a ghc bug?
The above comment sums up the issue (different behavior with Data.Text.pack on Intel vs M1 with ghc 9.2.1 and text 1.2.5.0).

jgm commented

I don't see this any more with ghc 9.2.2 on M1 mac, so I'm assuming the issue got fixed.