Lousy Unicode support in WinGHCi and fix for it
Opened this issue · 0 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
1. Launch WinGHCi
2. Enter "x <- getLine"
3. Enter or paste "résumé 履歴書 резюме" into editor
4. Enter "putStrLn x"
The expected output is "résumé 履歴書 резюме". Instead,
"rГ©sumГ© 履жґж›ё резюме" is shown.
I am using Haskell Platform 2012.4.0.0, with GHC 7.4.2, WinGHCi 1.0.6 on
Windows 7 x64, with CP_ACP set to CP1251 and CP_OEMCP set to CP866 (Windows
Cyrillic).
The explanation is as follows: WinGHCi merely runs a copy of ghci and passes
user input to the ghci's stdin and reads feedback from the ghci's stdout. To do
this, WinGHCi first sets the codepages for the ghci's console as ACP. Sending
command to ghci is done via converting user input (which is in native Unicode
encoding) to UTF8, and then passing the result byte string to ghci's stdin.
ghci, however, will intepret this input as being in ACP, not in UTF8, which may
lead to data corruption. Receiveing ghci's output, however, is done assuming
that ghci's stdout yields data in ACP, which doesn't cause any additional data
corruption.
Proposed fix: make interaction between WinGHCi and ghci happen copletely in
UTF8. It can be done as follows:
1. In file StartGHCI\StartGHCI.c, lines 65 and 66 must be replaced with this:
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
2. In file Utf8.c, line 138 must be replace with this:
INT res = MultiByteToWideChar(CP_UTF8, 0, strIn, lenIn, strOut, maxWChars);
After recompiling, the steps 1-4 produce intented result: "résumé 履歴書
резюме".
The proposed patch is attached as a "Unified DIFF" file, generated with Git's
help.
Original issue reported on code.google.com by Joker...@gmail.com
on 21 Feb 2013 at 12:59
Attachments: