[BUG] Redirected stdout (pipe or file) on windows has wrong encoding
jeffrson opened this issue · 2 comments
Checks
- I have read the troubleshooting section and still think this is a bug.
Describe the bug you encountered:
I have a folder that contains (for demonstration purposes) 3 folders, with names 'hälfte', 'hören', 'hülle' (containing umlauts).
fd
shows correctly:
> fd
hälfte\
hören\
hülle\
However, when I redirect this output to a file or to another program via pipe, umlauts become unreadable (don't know, which encoding):
> fd | xxd
00000000: 68e2 949c c3b1 6c66 7465 5c0d 0a68 e294 h.....lfte\..h..
00000010: 9cc3 8272 656e 5c0d 0a68 e294 9ce2 959d ...ren\..h......
00000020: 6c6c 655c 0d0a lle\..
Describe what you expected to happen:
Output should be like:
> echo hälfte | xxd
00000000: 68c3 a46c 6674 650d 0a h..lfte..
> echo hören | xxd
00000000: 68c3 b672 656e 0d0a h..ren..
> echo hülle | xxd
00000000: 68c3 bc6c 6c65 0d0a h..lle..
What version of fd
are you using?
fd 9.0.0
Which operating system / distribution are you on?
Windows 22H2 19045.4170
Windows Terminal with codepage 65001
> chcp
Aktive Codepage: 65001.
That looks like some unholy combination of UTF-8, Latin-1, and codepage 437:
>>> ftfy.fix_and_explain('\x68\xe2\x94\x9c\xc3\xb1\x6c\x66\x74\x65')
ExplainedText(text='hälfte', explanation=[('encode', 'latin-1'), ('decode', 'utf-8'), ('encode', 'cp437'), ('decode', 'utf-8')])
Are you using PowerShell? There was a similar report in #1047 but it was fixed by switching the system to UTF-8. Your system is already on it (codepage 65001) so I don't know why it's not working. Unfortunately I am not a Windows expert so we may need some outside help for this.
I had seen this other report, so I checked chcp
which had been 65001 already.
I wasn't aware that this might be a shell related issue, thanks for pointing this out. I'm using PowerShell Core, "Windows PowerShell" (5.1) and cmd behave differently.
Then I found PowerShell/PowerShell#17523 which helped:
[Console]::InputEncoding = [Console]::OutputEncoding = $OutputEncoding = [System.Text.UTF8Encoding]::new($false)
I compared original values of [Console]::OutputEncoding and realized it still was set to codepage 850 (despite chcp 65001).
Anyway, I guess this could be considered "fixed" (non bug) :-)