beyondgrep/ack1

Output of non-ASCII chars garbled if they match inverted character class

Closed this issue · 1 comments

I have a file containing one non-ASCII character, e.g. the German Umlaut "ö". Matching the "ö" normally then all output is just fine. However, the output is garbled when the "ö" is matched by using an inverted character class.

My use case is that I'm searching for files that still use other encodings that UTF-8, and for that I use a character class that excludes all "known good" characters. However, this problem also occurs with UTF-8 encoded files.

Here's an example (copy & paste from the console):

[0 mbunkus@chai-latte ~] ack ö hallo.txt
Hallöle
[0 mbunkus@chai-latte ~] ack -i '[^a-z]' hallo.txt
Hall[0m�le
[0 mbunkus@chai-latte ~] cat hallo.txt
Hallöle
[0 mbunkus@chai-latte ~] locale
LANG=en_US.UTF-8
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC=de_DE.UTF-8
LC_TIME=en_DK.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Note that the colorization includes only the "ö" in the good case and "[0m�" in the bad case. Meaning the colorization is correct regarding which characters are highlighted and which aren't; just the characters output are wrong.

Edit: this is ack 2.04
Edit2: it also happens with ack git at 3e498f7.

Would you please repost this in petdance/ack2?