an-cabal/an-rope

Indexing ropes with Unicode characters is Wildly Broken

Closed this issue · 1 comments

hawkw commented

I was working on adding some QuickCheck property tests for Ropes, and I wrote two tests to ensure that Rope insert behaviour is equivalent to String insert behaviour.

Unfortunately, these tests revealed an unrelated issue: when QuickCheck generates a string containing Unicode characters, Rope's indexing operation in Rope.split() panics.

Here are a handful of the panics (it tried a lot of weird strings, so I'm only attaching the first couple):

---- test::properties::rope_insert_is_string_insert stdout ----
        thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `¨T=/%�𞹧*jL~1ju츿瑞2⠇3*〗򸨋F+ຘ⁆숥⁧G0𦏦𦏦^┏n䘧Y]c,”¥h䱤^‟굃…7‧[2�
                                                           &›몍% G=‑ª؜S񖤰A櫾2|򮁺|?(¢惐6�򽑤 ‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `”¥h䱤^‟굃…7‧[2�
  &›몍% G=‑ª؜S񖤰A櫾2|򮁺|?(¢惐6�򽑤 ‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `”¥h䱤^‟굃…7‧[2�
  &›몍% ‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `”¥h䱤^‟굃…7‧‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `^‟굃…7‧‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `…7‧‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ဓ7‧‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠊ7‧‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠂ7‧‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀ7‧‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀ‧‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀန‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀࠊ‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀࠂ‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀࠀ‹` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀࠀဝ` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀࠀࠏ` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀࠀࠇ` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/str/mod.rs:1754
thread 'test::properties::rope_insert_is_string_insert' panicked at 'index 0 and/or 9 in `ࠀࠀࠃ` do not lie on character boundary', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libc

Obviously we are not currently handling grapheme indices properly.

hawkw commented

Failing tests are added to the the fix-unicode-indexing branch at 1507e30