amiremohamadi/DuckX

UTF-8 Support

s0bes opened this issue · 2 comments

s0bes commented

Function bool duckx::Run::set_text(const char *text) writes text into document.xml with no issues for English characters.
When I try to write Russian letters (wide char) it writes with no issues but the MS Word can not open the docx file showing an error.

What I tried (stages):

  1. Create docx with one Russian letter (i.e Ж)
  2. Replace Russian letter with English one (i.e J) using set_text function.
  3. Save document - DOCX OPENS WITH NO ERRORS
  4. Replace English Letter with Russian (J -> Ж)
  5. Save document - DOCX OPENS WITH AN ERROR
  6. Replace Russian Letter with the English one (Ж -> J)
  7. Save document - NO ERRORS

I compared original file (stage 1 - manually created file with Russian letter) and the one with an English letter (stage 7 or 3). The only file that was changed is document.xml.

Stage 1 - document.xml has UTF-8 encoding and header <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Stage 2-7 - document.xml now has ANSI encoding and NO header from stage 1. The letter writes correctly inside <w:t>Ж</w:t>. But due to ANSI encoding and removed header from original file it can not be opened if contain wide characters. No issues for ordinary characters though.

Is there's something I am missing in library configuration or is it a bug then?

You can try to replace minizip with libzip

same question