Panic on Split text with Unicode

Question

Panic on Split text with Unicode

lordofscripts opened this issue 8 months ago · 0 comments

The source document (novel manuscript) I use has Unicode characters like:

Object Replacement Character
Guillemets/Chevrons «»
M Dash —

I am using the standard fonts, in particular ARIAL at the moment panic occurs. And during formatting I try to indent a block of text. For that I call:

pdf.SplitText(textBlock, workingWidth)

If the text is plain ASCII nothing bad happens. If the text has Unicode characters FPDF craps out at SplitText.go line #31 with the following PANIC:

panic: runtime error: index out of range [8212] with length 256.

8212 is the decimal value of the Unicode M-dash character (\u2014). This issue was originally happening when it encountered the **Object Replacement Character **(\ufffc) but I handled that gracefully by doing the necessary object substitutions.

However, in creative writing there are those other Unicodes I mentioned above that are crucial for proper manuscript formatting. Unfortunately the pdf.SplitText() method of the library is unable to process them.

For what I could see Arial & Times do have that Unicode character. And for what I see in Split text it gets that character width from the current font but it only has 256 items which is not realistic for Unicode.