Bill-Gray/PDCursesMod

Windows + UTF8 diacritical character output problem

Closed this issue · 2 comments

Hi there! Awesome work on the win32a variant of PDCurses, I'm really enjoying working with it.

However, I seem to have a problem when calling wprintw with UTF8 strings that contain certain diacritical marks. In this particular case, I've found the acute accent ´, aka 0xb4 to cause strange behavior. Specifically, the output is terminated at this character, and the next line is bunched up on the previous line. Sorry if that's a crappy description, here's an example:

Expected output:

A Hard Day´s Night
Abbey Road
Beatles For Sale
...

But here's what it actually looks like:

A Hard DayAbbey Road
Beatles For Sale
...

Is there any known solution or work around for this problem? Besides the obvious "use a regular apostrophe instead?"

Thanks!

Hmmm... here's a minimal example that does produce that acute accent :

#include <curses.h>

int main( const int argc, const char **argv)
{
initscr();
cbreak( );
noecho( );
clear( );
refresh( );

 printw( "A Hard Day\xc2\xb4s Night\n");
 printw( "Abbey Road\n");

 refresh();
 getch();

 refresh();

 endwin();
 return( 0);

}

You'll notice that the acute accent in the printw() call has been UTF-8

encoded, resulting in it becoming two bytes instead of one :

https://en.wikipedia.org/wiki/UTF-8

I tried just doing it as "A Hard Day\xb4s Night\n" (which isn't a valid

UTF-8 string) and got exactly the behavior you describe. I've not checked
all that closely, but I'd wager that the code marches along through the
string, finds invalid UTF-8, and stops.

If you _do_ have trouble even with a for-real UTF-8 string,  I'd give

the above mini-program a try and see what it does.

-- Bill

On 2016-05-14 01:27, clangen wrote:

Hi there! Awesome work on the win32a variant of PDCurses, I'm really enjoying working
with it.

However, I seem to have a problem when calling |wprintw| with UTF8 strings that contain
certain diacritical marks. In this particular case, I've found the acute accent |´|, aka
|0xb4| to cause strange behavior. Specifically, the output is terminated at this
character, and the next line is bunched up on the previous line. Sorry if that's a
crappy description, here's an example:

Expected output:

|A Hard Day´s Night Abbey Road Beatles For Sale ... |

But here's what it actually looks like:

|A Hard DayAbbey Road Beatles For Sale ... |

Is there any known solution or work around for this problem? Besides the obvious "use a
regular apostrophe instead?"

Thanks!


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/Bill-Gray/PDCurses/issues/5

Shoot, you're absolutely right -- I had a problem with my UTF8 decoding and it was missing that leading byte. Argh! Apologies for the waste of time, and thanks for looking into this so promptly!