sharkdp/fd

[BUG] Redirected stdout (pipe or file) on windows has wrong encoding

jeffrson opened this issue · 2 comments

Checks

  • I have read the troubleshooting section and still think this is a bug.

Describe the bug you encountered:

I have a folder that contains (for demonstration purposes) 3 folders, with names 'hälfte', 'hören', 'hülle' (containing umlauts).
fd shows correctly:

> fd
hälfte\
hören\
hülle\

However, when I redirect this output to a file or to another program via pipe, umlauts become unreadable (don't know, which encoding):

> fd | xxd
00000000: 68e2 949c c3b1 6c66 7465 5c0d 0a68 e294  h.....lfte\..h..
00000010: 9cc3 8272 656e 5c0d 0a68 e294 9ce2 959d  ...ren\..h......
00000020: 6c6c 655c 0d0a                           lle\..

Describe what you expected to happen:

Output should be like:

> echo hälfte | xxd
00000000: 68c3 a46c 6674 650d 0a                   h..lfte..
> echo hören | xxd
00000000: 68c3 b672 656e 0d0a                      h..ren..
> echo hülle | xxd
00000000: 68c3 bc6c 6c65 0d0a                      h..lle..

What version of fd are you using?

fd 9.0.0

Which operating system / distribution are you on?

Windows 22H2 19045.4170
Windows Terminal with codepage 65001
> chcp
Aktive Codepage: 65001.

That looks like some unholy combination of UTF-8, Latin-1, and codepage 437:

>>> ftfy.fix_and_explain('\x68\xe2\x94\x9c\xc3\xb1\x6c\x66\x74\x65')
ExplainedText(text='hälfte', explanation=[('encode', 'latin-1'), ('decode', 'utf-8'), ('encode', 'cp437'), ('decode', 'utf-8')])

Are you using PowerShell? There was a similar report in #1047 but it was fixed by switching the system to UTF-8. Your system is already on it (codepage 65001) so I don't know why it's not working. Unfortunately I am not a Windows expert so we may need some outside help for this.

I had seen this other report, so I checked chcp which had been 65001 already.

I wasn't aware that this might be a shell related issue, thanks for pointing this out. I'm using PowerShell Core, "Windows PowerShell" (5.1) and cmd behave differently.

Then I found PowerShell/PowerShell#17523 which helped:
[Console]::InputEncoding = [Console]::OutputEncoding = $OutputEncoding = [System.Text.UTF8Encoding]::new($false)

I compared original values of [Console]::OutputEncoding and realized it still was set to codepage 850 (despite chcp 65001).

Anyway, I guess this could be considered "fixed" (non bug) :-)