dotnet/interactive

Polyglot Notebook: [NetTestingE2E][GB18030] Some GB18030 strings display incorrect in Output, show as Unicode char (e.g. \uD86D\uDCE7...).

Opened this issue · 0 comments

Describe the bug:
Some GB18030 strings (e.g. 𫓧𬬮U1U2) display as \uD86D\uDCE7\uD872\uDF2E\uE05E\uE05F\uE070\uE081U1\uE2C1\uE2C2\uE2C3\uE2D4\uE2E5U2\uE546\uE547\uE548\uE559\uE55A in Output.

Testing Data: Level2 GB18030-2022 Testing Data for medium large amount cases-GB18030
Group3:舰剑饯渐溅建僵齄鿀龬ɑπ㈢Q𫓧𬬮U1U2U3()ao㩹㩺㩻㩼㩽䀃E9;cz囌囍囎囐囓囑囒㏄㏑⿲⿳⿻〇cz珸珹䲟珺珻珼陫

Note:

  1. Repro VM: 172.16.194.187
  2. Test on Win 11 24H2 ZH-CN (Chinese (Simplified) Loc OS)

Pre-steps:
1.On Chinese OS, install VSCode and dotnet-interactive-vscode-1.0.6323011.vsix extension component.
2. Install the language package of Chinese (Simplified) from VSCode -> Change the display language of VSCode to Chinese (Simplified)

Steps:

  1. Ctrl+Shift+P => "Polyglot Notebook: Create new blank notebook"
  2. Select "Create as .dib" ->Select "C#"
  3. Set the cell contents as following and execute cell

var value = new { Name = "Developer舰剑饯渐溅建僵齄鿀龬ɑπ㈢Q𫓧𬬮U1U2U3()ao㩹㩺㩻㩼㩽䀃E9;cz囌囍囎囐囓囑囒㏄㏑⿲⿳⿻〇cz珸珹䲟珺珻珼陫", Salary = 42 };
value.Display("text/html", "application/json");

Actual Results:
Some GB18030 strings (e.g. 𫓧𬬮U1U2) display as \uD86D\uDCE7\uD872\uDF2E\uE05E\uE05F\uE070\uE081U1\uE2C1\uE2C2\uE2C3\uE2D4\uE2E5U2\uE546\uE547\uE548\uE559\uE55A in Output.
Image

Image

Expected Results:
All strings should display correctly in VSCode UI.

Please complete the following:

Which version of .NET Interactive are you using? (In a notebook, run the #!about magic command. ):

  • OS
    • [√ ] Windows 11
    • Windows 10
    • macOS
    • Linux (Please specify distro)
    • iOS
    • Android
  • Browser
    • Chrome
    • Edge
    • Firefox
    • Safari
  • Frontend
    • Jupyter Notebook
    • Jupyter Lab
    • nteract
    • [√ ] Visual Studio Code
    • Visual Studio Code Insiders
    • Visual Studio
    • Other (please specify)