smourier/TraceSpy

Unicode symbols not displayed

Closed this issue · 21 comments

Instead I get garbage like this even though I've changed the font the fonts support the symbols
tracespy

I suppose you're speaking about OutputDebugString ("ODS") traces as it should work fine for ETW traces.

Indeed, it didn't work as Unicode support for OutputDebugString is very limited as explained in official documentation: https://learn.microsoft.com/en-us/windows/win32/api/debugapi/nf-debugapi-outputdebugstringw

OutputDebugStringW converts the specified string based on the current system locale information and passes it to OutputDebugStringA to be displayed. As a result, some Unicode characters may not be displayed correctly.

However, I've added a feature where you can choose the encoding used for ODS traces, and this can ultimately allow you to use Unicode strings, but only with some limitations:

  • you must use UTF-8 encoding when you call OutputDebugStringA
  • you must not use OutputDebugStringW at all
  • you must configure WpfTraceSpy to use UTF-8 as the "ODS encoding"

See full details here
https://github.com/smourier/TraceSpy?tab=readme-ov-file#unicode-support-for-outputdebugstring-ods-traces

Yes, this is about OutputDebugString
Interesting, if these are global limitations, then why does those symbols show fine even in the old Debugview app?

Anyway, the newer version works (it had utf-8 already configured and I don't control which API the app calls as it's an external app AutoHotkey, but I guess the correct one), though some combining chars still bug

this is TraceSpy
dbg1

This is Debugview, layout is still buggy, but the combining chars are visible
dbg2

Thanks for a quick fix!

I don't know what app you're talking about exactly but this https://learn.microsoft.com/en-us/sysinternals/downloads/debugview doesn't work with Unicode.

If you don't control the app and it's using OutputDebugStringW then there's nothing to do, it won't work in the general case.

You should put exact input as texts, not bitmaps, they are useless when talking about characters.

Yes, Debugview you linked to is the app I meant, and it does work with unicode, that's where the screenshot is from (the text coming from the app is h⃣k⃣⌂)

The other app I've tried is https://github.com/CobaltFusion/DebugViewPP, which also displays the chars like DebugView

AFAIK none of these work correctly with Unicode in general (and UTF-16 encoding in particular), they don't work for me. Give me a text and I'll show you how they render on my machine.

Maybe you mean UTF-8, that's not strictly the same thing as Unicode and not the same thing as UTF-16, maybe what happens is you're using a code page on your machine which make it work, like I said in the doc here https://github.com/smourier/TraceSpy?tab=readme-ov-file#unicode-support-for-outputdebugstring-ods-traces that is possible

I've given you the text in the previous comment: h⃣k⃣⌂

I don't know the exact encoding paths, the file with this text is UTF-8, but AutoHotkey is a UTF-16 app, the code page on my machine is indeed UTF-8, and that's the setting in TraceSpy (guess it picked up the system version, I didn't change anything)

This is how this text shows up:

Screenshot 2024-03-06 145454
Screenshot 2024-03-06 145348

But I'm not sure about the text you give, this is how it looks in my Chrome looking at this question:

image

In Notepad (Windows 11)

image

In Visual Studio

image

In Visual Studio Code

image

If you can provide me a way to test it with autohotkey, I can have a deeper look.

DebugView++ is the same as DebugView here,

DBGVpp

Maybe it's a font issue? Though tried different fonts, they dispaly it fine, but this could also be some font fallback system thing?

The Chrome/Notepad are semi-fine, wouldn't expect any heroics from these apps, but they do display the combining keycap around the key, which is close to the proper way

The AutoHotkey test is as simple as this script.ahk example file below:

#Requires AutoHotKey 2.0

!1::OutputDebug('⎇1OutputDebug: h⃣k⃣⌂')

every time you press Alt1 after launching this script you should get the message ⎇1OutputDebug: h⃣k⃣⌂ ODS-printed
(I think AutoHotkey might still have two modes of installation: ANSI and Unicode ones, I was using the latter)

So the text here h⃣k⃣⌂ is composed of 5 unicode codepoints

0x0068 h
0x20E3 "Combining Enclosing Keycap" https://www.compart.com/en/unicode/U+20E3
0x006B k
0x20E3 "Combining Enclosing Keycap"
0x2302 "House" https://www.compart.com/en/unicode/U+2302

Indeed when we see question mark glyphs (in a box or w/o a box), it means the font (or the graphics layer) used doesn't handle the caracter so this rendered text:

image
means the font isn't capable of displaying all the corresponding glyphs.

If I use, say, unifont https://www.unifoundry.com/unifont/index.html I can have glyph displayed when I do as instructed, ie: UTF8 + OutputDebugStringA

So, with this C# program:

 static void Main()
 {
     var str = "h⃣k⃣⌂";
     Trace.WriteLine(str);

     // one way of using it
     OutputDebugStringA(str);

     // another way of using it (if UnmanagedType.LPUTF8Str is not available)
     var bytes = Encoding.UTF8.GetBytes(str);
     OutputDebugStringA(bytes);
 }

 [DllImport("kernel32")]
 private static extern void OutputDebugStringA(byte[] str);

 [DllImport("kernel32")]
 private static extern void OutputDebugStringA([MarshalAs(UnmanagedType.LPUTF8Str)] string str);

This is what I see:

image

I can't manage to have this displayed using DbgView or DebugView++, if you see the proper glyphs w/o using a specific font, then it's something installed or configured in your machine lilke regional settings etc. that make it work. Actually this discussion in DebugView++ issues is more or less the same CobaltFusion/DebugViewPP#389

As for authotkey I don't know how to do the strict technical equivalent of what's done in C# (or in C/C++), ie OutputDebugStringA+UTF8.

So, I will leave the code as is.
PS: noone should use OutputDebugString, it's a 30+ year piece of crap.

if you see the proper glyphs w/o using a specific font, then it's something installed or configured in your machine lilke regional settings etc. that make it work.

Checked in an app that Tahoma system font has this symbol, and it indeed is also working in TraceSpy! Is TraceSpy maybe doing something special regarding font fallbacks compared to the DebugViewPP?

Or maybe it's indeed that setting (I have it enabled) image

but then again the question is why is this same setting not working for TraceSpy, but working for DebugViewPP?

I don't know, but actually it's the reverse for me, I can't get DebugView++ to display the proper glyph whatever I try.

Have you tried setting those utf-8 system settings? Just checked, and indeed, without this enabled I get ???instead in degubview.

So I guess that explains it - TraceSpy is using a different path from this utf-8 system setting and thus doesn't have access to the system font fallback mechanism that is used automatically in that system setting path?

With UTF-8 as default in Windows, WpfTraceSpy will choose UTF-8 as default encoding instead of ANSI encoding (this one depends on the code page) so with the same C# program, we see the glyph displayed but the 0x20E3 are not combined (this is more a limitation of WPF I think).

image

So Unicode support works (rendering is another thing). For example if I send these chinese characters "我们一起去玩吧。" (with same C# code and Lucida Console) I see this, which is correct

image

Are you sure the first screenshot is not blank boxes signifying the char is not found instead of combining keycap? Maybe check with another combining char like ̥
U+032

Here (output from autohotkey) I see that the ring char is displayed and combined while the keycap is not displayed and, naturally, not combined

trace1

Is fixed for me.

yg-i commented

FWIW this worked great on my machine:

you configure WpfTraceSpy to use UTF-8 as the "ODS encoding", like shown here (menu "Options"/"ODS Encoding...", by default the encoding is the default ANSI encoding):

image

FWIW this worked great on my machine:

have you tried the combining keycap char?

yg-i commented

Here's how it looks with Tahoma font (I assume this is the appearance you want).[1] FWIW, I'm pretty convinced this is not an encoding but a font issue. If I copy and paste your h⃣k⃣⌂ into Microsoft Word, and try various different fonts, I find that pretty much every font displays this differently, and only Tahoma produces your desired appearance. Even in Chrome and Firefox, without forcing the Tahoma font, they look all tangled up, like in [2].

[1]
image

[2]
image

Here's how it looks with Tahoma font (I assume this is the appearance you want).[1]

No, try it without the Tahoma font manually selected. It should work without users explicitly selecting the font as it does on DebugView++ and Word and all the other apps - they use system font fallback to display these symbols in the font that has them even if the currently selected font doesn't have them

↓ this is without Tahoma selected, the base font is a different, monospaced font
DBGVpp

FWIW, I'm pretty convinced this is not an encoding but a font issue

It is and has been noted above, "Checked in an app that Tahoma system font has this symbol, and it indeed is also working in TraceSpy! Is TraceSpy maybe doing something special regarding font fallbacks compared to the DebugViewPP?"

But it's still not "working great" since font fallback is NOT working

yg-i commented

It should work without users explicitly selecting the font as it does on DebugView++ and Word and all the other apps

It doesn't work in Microsoft Word (or Chrome, or Firefox) on my machine without using Tahoma font.

image

If you run Get-Culture in powershell, do you have en-US as the output? If you have a different locale, font fallback behavior could be different (see electron/electron#18829 for some discussion) and it could also explain why dgbview/DebugView++ can display unicode correctly on your machine (owing to the locale sensitivity of OutputDebugStringW @smourier pointed to earlier)

What do you mean "it doesn't work" when in the "Verdana"-labeled text you have the same working thing - a combining keycap correctly displayed as a combining keycap (though not perfectly overlapping the h key)?

(by the way, you can actually click after h and check the font in Word- you'll see that it's not Verdana anymore, but Tahoma, font substitution silently works)

If you run Get-Culture in powershell, do you have en-US as the output?

yes

and it could also explain why dgbview/DebugView++ can display unicode correctly on your machine

No, you linked to a different issue. In that case the font sub was working, just using the wrong font, but still a font with a valid glyph for this symbol. In this case the font with a valid glyph exists, but is NOT used, instead displaying a generic "not found" box