Bill-Gray/PDCursesMod

wincon with chcp 65001: wgetch chrashes terminal when entering EURO sign

GitMensch opened this issue · 8 comments

That's reproducible, for example with the sample program in #245.
I guess that wincon isn't even to blame but MS, but I'm not sure and as others may stumble over this, too...

Test:

> chcp 65001
> getme
€

Result: the application seems to hang, but even killing it does not change anything (maybe we are in a special "terminal mode"?) - we neither see any further input nor is it processed by the owning cmd.exe, there is no option to scroll.

Using ProcessExplorer I could see the follwing:

When "Windows Terminal" is used, the owning Terminal OpenConsole.exe chrashes (a subprocess WerFault.exe is started and the process tree killed when its collection is finished); when using "plain cmd", the owning "conhost.exe" dies the same way.
In both cases cmd.exe with its subprocess getme stays open until killed.

Using the debugger I couldn't see much, only that as long as PDCurses does not "process" the key (it is waiting in the debugger) there is no crash; and after the chrash PDCurses is still looping in wgetch.

I have no clue if there's something to see in chrashing_code.txt - that's part of a run in which the crash occurs (with skip _show_run_of_ansi_characters active as this flooded the output [not sure if that needs to be called that often] and also with skip gettimeofday as I did not want to inspect the system calls).
I've seen that code while auto-stepping, started "recording", then pressed € then seen more of that code, then seen the crash (only in ProcessExplorer) then seen wgetch still looping, stopped and put the stepping code in the file...

Does this break both with 8-bit and wide-character builds? It would also be interesting to know if it breaks in PDCurses as well as with PDCursesMod.

(Wrote the above -- which may still be pertinent -- then noticed the uninitialized variable problem in my comment on issue #245. I think the fact that linep is uninitialized (well, set equal to an uninitialized value), unless you enter at least 100 characters, is the main problem here.)

I've only checked PDCursesMod 4.8.4 WIDE UTF8; all that building is quite some work, but if you want me to I could try the non-utf8 and also non-wide version, too. Just ask. Reminder: This only happens if UTF8 encoding is active. It may even go away when those HALFWIDTH HANGUL ..." issues in #245 are solved.

Testing with the fixed version of the test program had, as expected, the same results (the bad init had only an effect on the function if more than 100 characters are entered, which wasn't the case and on the return of the function, which wasn't reached in the chrash-scenario).

the bad init had only an effect on the function if more than 100 characters are entered

...had an effect if fewer than 100 characters are entered. (If you hit the hundred-character mark, the buffer is reallocated for 200 characters and then linep is set to line. If you don't, linep is set to garbage.) But I think you're right about it not actually causing trouble until you return from the function... and it's crashing before you get to that point.

The failure to get correct values is entirely to be expected if you've compiled with UTF=Y and run with a non-Unicode locale. Referring to commit 83dcf79, if we have an 8-bit charset, we should look at KEV.uChar.AsciiChar. If we're in a Unicode locale, we look at KEV.uChar.UnicodeChar. uChar is a union, so sadly, we can't get both the eight-bit value and the Unicode value.

To use a Unicode version of PDCurses with an 8-bit locale, we'd have to look at KEV.uChar.AsciiChar and find the corresponding Unicode character. I think that upon writing the Unicode character back to the screen, we might have to convert it back to the code page... though maybe not; in WinGUI, we display Unicode text in all cases by using ExtTextOutW; there may be some similar function for the Windows console. (We may even be using it. I don't know much about the Windows console.)

In the reverse case, where we've built PDCurses with an 8-bit character field but are running in a Unicode locale, we'd have to do opposite trickery. Somebody would hit, say the Euro sign; Windows would give us the Unicode code point U+20AC, and we'd convert that to (for CP1252) 0x80. On output, we'd have to convert the CP1252 (or other eight-bit character set) to Unicode. If you look around lines 445-453 in wingui/pdcdisp.c, you'll see where we do that latter bit, using mbtowc().

Anyway. I'd expect mangled input and output in both of the above scenarios, but I would not expect it to crash. That bit needs investigation.

Seems fixed, based on my probably not exhaustive testing. I'm confident enough to close this.

Should have referenced commit e86d9c3, in which the 'correct' cast was made to fix this issue.

I've tested this with the last CI build - and still got a chrash of the owning terminal :-/

... got pdcursesd.dll so tested the wrong one. Retesting with the correct one - and can confirm that this is solved :-)

... and "kind of back, but reversed": if one uses a non-utf8 non-wide version of PDCursesMod and explicit set chcp 65001 then entering the data during display results in replacement characters... and a crash of the owning terminal.
Not sure if we can do something about that, too.