is_cid() gives wrong result for Type0 fonts
badicsalex opened this issue · 1 comments
I'm using is_cid() to determine wheter I need to read 1 or 2 byte chunks from the text operation:
Line 212 in 4f18fab
I've come across a file which has a Type0 font (cropped version: https://stickman.hu/junk/cid_issue.pdf ) containing a CIDFontType0 font, and all the text rendered with this font uses 2 byte chars. The surrounding code apparently checks for similar situations for other queries, but not is_cid.
I also see that there are various workarounds in the sibling project (setting is_cid if there is a cid_to_gid_map or when it's identity), but that's a bit counterintuitive for me.
According to the PDF standard v1.7, Section 9.7.1 NOTE 1:
PDF supports only a single descendant, which shall be a CIDFont.
Which means that the following code should be correct for PDFs (not necessarily postscript though):
matches!(self.data, FontData::Type0(_) | FontData::CIDFontType0(_) | FontData::CIDFontType2(_))
It is possible that you found quite a bug there.
I have been wondering why things were a bit wonky.