pdf-rs/pdf

is_cid() gives wrong result for Type0 fonts

badicsalex opened this issue · 1 comments

I'm using is_cid() to determine wheter I need to read 1 or 2 byte chunks from the text operation:

matches!(self.data, FontData::CIDFontType0(_) | FontData::CIDFontType2(_))

I've come across a file which has a Type0 font (cropped version: https://stickman.hu/junk/cid_issue.pdf ) containing a CIDFontType0 font, and all the text rendered with this font uses 2 byte chars. The surrounding code apparently checks for similar situations for other queries, but not is_cid.

I also see that there are various workarounds in the sibling project (setting is_cid if there is a cid_to_gid_map or when it's identity), but that's a bit counterintuitive for me.

According to the PDF standard v1.7, Section 9.7.1 NOTE 1:

PDF supports only a single descendant, which shall be a CIDFont.

Which means that the following code should be correct for PDFs (not necessarily postscript though):

matches!(self.data, FontData::Type0(_) | FontData::CIDFontType0(_) | FontData::CIDFontType2(_))
s3bk commented

It is possible that you found quite a bug there.
I have been wondering why things were a bit wonky.