Seek documentation that supports Chinese and Japanese character sets

Question

Seek documentation that supports Chinese and Japanese character sets

khlipeng opened this issue 7 years ago · 6 comments

Answer 1 · 2017-05-05T10:34:03.000Z

This is, what, the third request for more language support in a row? 1, 2

PDF, as a standard, does not really do unicode or UTF-8, for the simple reason that PDF is older than unicode. Instead it uses "code pages" that are a subset of the unicode standard. For example, the "BIG5" code page used for traditional Chinese is a two-byte encoding; the "GBK" code page for simplified Chinese is one or two bytes; the "Shift JIS" code page is one or two bytes. There are set mappings between each of these encodings and the full unicode space, but none of the code pages can represent the entire unicode space.

Rather than any one individual language, the project needs to be able to handle the full unicode range. Ideally, you want to be able to pass the library a Go string, and it works out all the complications of code pages etc internally. But having looked into it, I can tell you that doing that properly would be A LOT of work. It's not an easy fix for Chinese or Arabic, it's a big project in its own right.

Answer 2 · 2017-05-05T10:42:12.000Z

Good summary, @marcusatbang

Mind if I copy it into the documentation to put the issue out in front?

Answer 3 · 2017-05-05T10:49:30.000Z

@marcusatbang @jung-kurt Thanks

Answer 4 · 2017-05-05T10:50:10.000Z

Go head, @jung-kurt . I only wish I had any spare time to actually help.

Answer 5 · 2018-07-02T09:55:22.000Z

@jung-kurt Thanks for the fantastic library. I have invested lot of time building a pdf template using this and just now realised it's not fully UTF-8 compatible. It'd be a great help if you could highlight this lacking feature in sort of a warning box in the README for future users like me :)

Answer 6 · 2018-07-02T10:12:57.000Z

Good idea. I'd like to put the last paragraph of the Features section into a warning box, but I am not sure how that is down with Github Markdown. Any ideas?