Windows + UTF8 diacritical character output problem
Closed this issue · 2 comments
Hi there! Awesome work on the win32a variant of PDCurses, I'm really enjoying working with it.
However, I seem to have a problem when calling wprintw
with UTF8 strings that contain certain diacritical marks. In this particular case, I've found the acute accent ´
, aka 0xb4
to cause strange behavior. Specifically, the output is terminated at this character, and the next line is bunched up on the previous line. Sorry if that's a crappy description, here's an example:
Expected output:
A Hard Day´s Night
Abbey Road
Beatles For Sale
...
But here's what it actually looks like:
A Hard DayAbbey Road
Beatles For Sale
...
Is there any known solution or work around for this problem? Besides the obvious "use a regular apostrophe instead?"
Thanks!
Hmmm... here's a minimal example that does produce that acute accent :
#include <curses.h>
int main( const int argc, const char **argv)
{
initscr();
cbreak( );
noecho( );
clear( );
refresh( );
printw( "A Hard Day\xc2\xb4s Night\n");
printw( "Abbey Road\n");
refresh();
getch();
refresh();
endwin();
return( 0);
}
You'll notice that the acute accent in the printw() call has been UTF-8
encoded, resulting in it becoming two bytes instead of one :
https://en.wikipedia.org/wiki/UTF-8
I tried just doing it as "A Hard Day\xb4s Night\n" (which isn't a valid
UTF-8 string) and got exactly the behavior you describe. I've not checked
all that closely, but I'd wager that the code marches along through the
string, finds invalid UTF-8, and stops.
If you _do_ have trouble even with a for-real UTF-8 string, I'd give
the above mini-program a try and see what it does.
-- Bill
On 2016-05-14 01:27, clangen wrote:
Hi there! Awesome work on the win32a variant of PDCurses, I'm really enjoying working
with it.However, I seem to have a problem when calling |wprintw| with UTF8 strings that contain
certain diacritical marks. In this particular case, I've found the acute accent |´|, aka
|0xb4| to cause strange behavior. Specifically, the output is terminated at this
character, and the next line is bunched up on the previous line. Sorry if that's a
crappy description, here's an example:Expected output:
|A Hard Day´s Night Abbey Road Beatles For Sale ... |
But here's what it actually looks like:
|A Hard DayAbbey Road Beatles For Sale ... |
Is there any known solution or work around for this problem? Besides the obvious "use a
regular apostrophe instead?"Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/Bill-Gray/PDCurses/issues/5
Shoot, you're absolutely right -- I had a problem with my UTF8 decoding and it was missing that leading byte. Argh! Apologies for the waste of time, and thanks for looking into this so promptly!