jgm/doclayout

Wrong length for curly apostrophe

Closed this issue · 4 comments

jgm commented
Prelude Text.DocLayout Data.List> literal "a’s"
Text 2 "a\8217s"

Should be length 3. I found this after noting a bunch of wrapping-related test failures in pandoc from the new doclayout release.

@Xitian9 can you see the problem? I believe this is due to your changes in real length calculation code.

jgm commented

Also with double quotes

Prelude Text.DocLayout Data.List> literal "a“b"
Text 2 "a\8220b"

Let me look into it.

Good catch. It looks like I mistyped the end of the ‘zero width joiners and directional markers’ block. which should be \x2010, not \x2030 (they are next to each other on my keyboard). I'll issue a fix.

It would be good to come up with a more comprehensive test suite to be able to catch these things. There are a load of extra characters sprinkled throughout unicode that should be zero-width, and I certainly haven't gotten all of them.

Here's a place to start: https://en.wikipedia.org/wiki/Combining_character

Also, ambiguous characters technically should be either width 1 or 2 depending on what they're surrounded by…