galkahana/PDF-Writer

Perfectly kosher bug :)

stribor opened this issue · 13 comments

I stumbled on a really odd one, on macOS Helvetica Neue ttc font has glyph with ID 65535 (not sure how/why and if other fonts have this problem) and then following line overflows unsigned short:

mSubsetFontGlyphsCount = subsetGlyphIDs.back() + 1;

Something like:
mSubsetFontGlyphsCount = subsetGlyphIDs.back() == 65535 ? 65535 : subsetGlyphIDs.back() + 1;
fixes problem, but maybe there is a better solution?

Thank you. If 65535 is an eligible id then i should probably not be using ushort, but rather a larger unsigned. Ill check. Thanks.

And in any case check the number and not cause an overflow.

im adding checks against glyphid 0xffff here - #256.
maxp numglyphs is defined as uint16, meaning the max values is 0xffff, and so max glyph index is oxffff-1.

so now...let's see how you got a glyph id with 65535. checked on my own mac each of the different 13 fonts in macOS Helvetica Neue ttc and im not seeing glyph id 65535. all have much lower counts (around 2K glyphs).

i wonder if you may possible provide your own copy...or even better...and example script to show the problem. especially cause the correction im proposing is aiming to block such a case rather than resolve it.

Maybe this isn't an issue anymore and was fixed. I was updating our app to the latest version of PDF Writer.
And I was going through changes, had years ago it and seen that in git commit from 2021. Unfortunately, I don't have the original bug report anymore, and I can't reproduce the problem anymore.
I tried to look for documents I had from around the time.

But there is something odd with embedding (maybe it should go to different bug report). While trying to reproduce glyph id with 65535 and playing with Helvetica Neue.ttc (the one that comes with macOS in /Library/Fonts/) I noticed lots of square boxes instead of text when using PageContentContext::Tj(const GlyphUnicodeMappingList& inText)

I had to resurrect my hack/workaround from 2017:

bool WrittenFontTrueType::AddToANSIRepresentation(	const GlyphUnicodeMappingList& inGlyphsList,
													UShortList& outEncodedCharacters)
{
#if 1
	// Always disable ANSI representation so it will fallback to CID so ligatures won't crash it and some text would be wrongly displayed (I guess pray part doesn't work in all cases:) )
	return false;
#else
...

perhaps. with source font file and script to recreate the problem, a-la one of the tests so i can recreate, figure out what's wrong and correct. That would be lovely

Font file comes with macOS:
/System/Library/Fonts/HelveticaNeue.ttc
Version is 17.0d2e1

I copied it along other fonts to PDFWriterTesting/Materials/fonts/

and modified CIDSetWritingCFF test:

Index: PDFWriterTesting/CIDSetWritingCFF.cpp
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/PDFWriterTesting/CIDSetWritingCFF.cpp b/PDFWriterTesting/CIDSetWritingCFF.cpp
--- a/PDFWriterTesting/CIDSetWritingCFF.cpp	(revision 4b59b8240bd5f7ea75af68e77ef9a53c050bfae5)
+++ b/PDFWriterTesting/CIDSetWritingCFF.cpp	(date 1718110943000)
@@ -52,12 +52,25 @@
 		AbstractContentContext::TextOptions textOptions(pdfWriter.GetFontForFile(
 																			BuildRelativeInputPath(
                                                                             argv,
-                                                                            "fonts/KozGoPro-Regular.otf")),
+                                                                            "fonts/HelveticaNeue.ttc")),
 																			14,
 																			AbstractContentContext::eGray,
 																			0);
 
-		cxt->WriteText(75,600,"hello \xe6\x9c\x9d",textOptions);
+		cxt->WriteText(75,600,"Helvetica Neue",textOptions);
+
+		GlyphUnicodeMappingList guml;
+		guml.push_back(GlyphUnicodeMapping(47, 72));
+		guml.push_back(GlyphUnicodeMapping(76, 101));
+		guml.push_back(GlyphUnicodeMapping(83, 108));
+		guml.push_back(GlyphUnicodeMapping(93, 118));
+		guml.push_back(GlyphUnicodeMapping(76, 101));
+		cxt->BT();
+		cxt->k(0,0,0,1);
+		cxt->Tm(1,0,0,1,10,600);
+		cxt->Tf(textOptions.font, 14);
+		cxt->Tj(guml);
+		cxt->ET();
 
 		status = pdfWriter.EndPageContentContext(cxt);
 		if(status != eSuccess)

With this I get CIDSetWritingCFF_broken.pdf with no text visible, and if I do:

Index: PDFWriter/WrittenFontTrueType.cpp
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/PDFWriter/WrittenFontTrueType.cpp b/PDFWriter/WrittenFontTrueType.cpp
--- a/PDFWriter/WrittenFontTrueType.cpp	(revision 4b59b8240bd5f7ea75af68e77ef9a53c050bfae5)
+++ b/PDFWriter/WrittenFontTrueType.cpp	(date 1718111392000)
@@ -54,6 +54,7 @@
 	// i'll need to make something different out of the text.
 	// as you can see this has little to do with glyphs (mainly cause i can't use FreeType to map the glyphs
 	// back to the rleevant unicode values...but no need anyways...that's why i carry the text).
+	return false;
 	UShortList candidates;
 	BoolAndByte encodingResult(true,0);
 	WinAnsiEncoding winAnsiEncoding;

I get CIDSetWritingCFF_readable.pdf

CIDSetWritingCFF_readable.pdf
CIDSetWritingCFF_broken.pdf

HelveticaNeue.ttc.zip

awesome. thanks man. i'll try to look at it soon. maybe i @#$@#$ up the ansi path from glyphs. it's neat to see all these 14yrs old bugs coming up haha. thanks!

oh my. read the pdf specs again and some of ttf. bottom line i need to verify that the font has particular cmaps defined in it (ms unicode or Mac Roman) otherwise i cant use winansi encoding and have to revert to cid rep. or yknow i could just not bother with attempting a winansi implementation at all. which is what your workaround does. i reckon i'll try to stick with winansi but verify the existance of those cmaps and only then do ansi. otherwise cid. this means that in neue case, which doesn't have this cmap, it'll go to CID, while in arial.ttf thats in the samples materials (which has both) will get to keep ansi.

ok...i'll need a couple of more days to get to write this code, but shouldn't be long.

closed automatically cause i reffed from PR. anyways. i think it should be fine now

Gonna close this. If theres anything else we can reopen

I got bug report from user about same broken behavior with https://fonts.google.com/specimen/Vollkorn font, once downloaded I used ttf's from static folder. (Side question, any way of supporting variable fonts?).

PDF gets generated fine when FontHasCmapsForWinAnsiEncoding returns false

hmm. i wasn't able to recreate the problem. tried Vollkorn-reguler.ttf with simple WriteText("Vollkorn regular") and it came out fine. did verify im getting an ansi writing in the result pdf. maybe there's something more specific with a particular glyphs usage? if you can find out more details we can try to figure this one out.

RE variable fonts. in don't know much about them...so i don't really know if they can be used as is or whether there's a simple workaround to make the lib read them as plain ttf/otf/type1/dfont.

I tried to narrow it down to be able to reproduce it and what triggers it seems specifying glyphs

i.e. above example for Helvetica, with Vollkorn font:

		AbstractContentContext::TextOptions textOptions(pdfWriter.GetFontForFile(
										BuildRelativeInputPath(
                                                                            argv,
                                                                            "fonts/Vollkorn-Regular.ttf")),
																			14,
																			AbstractContentContext::eGray,
																			0);

		cxt->WriteText(75,600,"Rüc Res",textOptions);

		GlyphUnicodeMappingList guml;
		guml.push_back(GlyphUnicodeMapping(256, 82));
		guml.push_back(GlyphUnicodeMapping(505, 252));
		guml.push_back(GlyphUnicodeMapping(321, 99));
		cxt->BT();
		cxt->k(0,0,0,1);
		cxt->Tm(1,0,0,1,10,600);
		cxt->Tf(textOptions.font, 14);
		cxt->Tj(guml);
		cxt->ET();

		status = pdfWriter.EndPageContentContext(cxt);

breaks output if FontHasCmapsForWinAnsiEncoding(FT_Face font) returns true;

also note that: cxt->Tj(guml); affect also output of WriteText.

I mostly use AbstractContentContext::TJ and AbstractContentContext::Tj for text output and once problem is triggered I get whole text even following pages wrongly output.