prawnpdf/ttfunk

OTF Support

camertron opened this issue · 25 comments

Hey there maintainers! I've been working for the last several months on supporting OpenType fonts, which really just means supporting the CFF font table. This is no small task, so the resulting PR was gigantic (git diff --stat tells me 46 files changed, 2990 insertions, 296 deletions). For digestibility I've tried to split it up into bitesized chunks (there are 13 of them) that I will work on submitting as long as you have the patience to review them. The branches have been created off each other in cascading fashion. I will have to submit them as individual pull requests one at a time. The following list serves as a manifest of the branches and should let us track what's been done so far:

I have successfully subsetted several of the Noto and STIX fonts, which I think have been pretty good litmus tests as they are very large and contain good examples of the various CFF sub tables and features. I used an open-source tool called Font-Validator (initially written by the folks at Microsoft) to validate the results.

Please let me know how I can help futher!

@camertron Thank you for your contribution and willingness to undertake this huge effort.

I suspect that these PR are not quite useful in isolation. Do you want me to create a branch off of master so that we could work through all of them without disturbing master in the process and afterwards I'll merge that branch.

@pointlessone Yeah, most of them are not useful in isolation. I was thinking of creating pull requests one at a time, but if you're willing to review each branch and merge them into an intermediate branch, that would be awesome! The only thing I'm worried about is where the conversation would happen, since there wouldn't really be a way to comment on individual lines of code.

@camertron I created otf branch. Let's work on that one until all PRs are merged. I'll merge that branch into master when it's done. We'll have discussions in PRs and meanwhile keep master usable.

Ok awesome :) I'll change the target of my first pull request to the otf branch.

@pointlessone As I was working on getting the next feature tested and up for review (cff_charset), I ran into a problem. It's very difficult to test the rest of the CFF features individually because they are all connected. I am therefore going to combine the remaining cff_* branches into a single branch called cff_rest and submit it all together.

@camertron I appreciate your effort to make it easier for me but ultimately there’s no requirement for that so go ahead!

@pointlessone the day has finally arrived! The otf branch should be ready to go! I assume you'll want to rebase first. Anything I can help with?

@camertron Thank you 1000x! I'll take it from here.

One thing I ask you you to help with is to write up a list of features we get with this branch. I imagine we don't have 100% OTF support yet.

Absolutely. Here's the list:

  • Added support for CFF-flavored fonts (also known as CID-keyed or OpenType fonts).
  • Added support for the VORG and DSIG tables.
  • Improved charset encoding support.
  • Improved font metrics calculations in the head, maxp, hhea, hmtx, and os/2 tables.
  • Subsetted fonts verified with Font-Validator, fontlint, and Mac OS's Font Book.

I can describe the internal changes and additions I made to support CFF fonts if you like, but they're not really things users of the lib will need or want to know about.

The two biggest OpenType tables we don't yet support (like, at all) are GSUB and GPOS. I have branches going for them, but... wow they are complicated. I think they're really only necessary to support features like cursive (eg. Arabic, Farsi, Urdu, etc) and vertically written fonts, but it would still be nice to support them at some point.

This is good. Thanks.

@camertron is this method ever used?

def to_svg
path_data = commands.map do |command|
instr = case command[0]
when :move then 'M'
when :line then 'L'
when :curve then 'C'
when :close then 'Z'
end
"#{instr}#{format_values(command)}"
end.join(' ')
"<path d=\"#{path_data}\"/>"
end

@pointlessone i used it for testing purposes, but it’s not used internally.

@camertron Are there any other pieces of code like this?

@pointlessone I don't think so? Honestly not sure, it's been so long...

This is a great feature. I am looking forward to this new feature.

When can you release a new version? The previous version was two years ago.

I just tried the new release with Prawn and ghostscript complains about TT instructions it doesn't understand.

Failed to interpret TT instructions in font 34a264+ArialMT. Continue ignoring instructions of the font.

Here's how to reproduce:

  1. Install prawn and ttfunk 1.6.0

  2. Run this script (update the location of the TTF file as needed):

    require 'prawn'
    
    Prawn::Document.generate 'test.pdf' do
      def register_font data
        font_families.update(data.each_with_object({}) do |(key, val), accum| 
          accum[key.to_s] = val
        end)
      end
    
      register_font Arial: { normal: '/usr/share/fonts/msttcore/arial.ttf' } 
      font 'Arial'
      text 'foobar'
    end

3. Run ghostscript on the result:

   $ gs -sDEVICE=pdfwrite -dNOPAUSE -o out.pdf test.pdf

@mojavelinux Thank you for the report. I can reproduce the issue. I'm not quite sure what the warning means. I tried googling but didn't find the definitive answer. A ticket on Ghostscript bugzilla implies that it might be a result of a file containing more than one font.

However, that is not the case here. Subset fonts generated by both 1.5.1 & 1.6.0 contain only one font. TTFunk never generated flawless subset files as it doesn't do recalcualtions for many-many values in numerous font tables but it rarely (if ever) generated completely invalid fonts that threw off most renderers. If anything, fonts generated by 1.6.0 are less broken (as evident by FontValidator report). I tried the test PDFs and for both 1.5.1 & 1.6.0 they look fine in Preview and Acrobat. Please let me know if that is not the case for your actual documents.

I don't quite understand the purpose of the last step, though. What are you trying to achieve there?

Thank you for the report.

👍

A ticket on Ghostscript bugzilla implies that it might be a result of a file containing more than one font.

I highly doubt that's the case as I've used many fonts before and never had that issue.

I don't quite understand the purpose of the last step, though. What are you trying to achieve there?

One is to reduce the size of the PDF since Prawn generates PDF which can be quite large and there's is plenty of room to optimize the streams. But it also validates the PDF (according to ghostscript, which I consider a trusted source).

You can enable compression (compress: true option to Document constructor) to make a document a bit smaller.

Can't advise on the validation front. Acrobat can validate PDFs but it's a paid app and is not available for all major OSs.

While compress: true is helpful, it doesn't go as far as Ghostscript.

My question is, what changed in the TTF information that's stored in the PDF? My understanding is that OTF support was about reading font information from a new format, not changing what was put into the PDF in the end. Something was added / changed in the subsetting that Ghostscript doesn't like. And if Ghostscript is going to complain, I can't use this version of ttfunk or else I will receive a flood of issue reports about the warning.

Hey @pointlessone and @mojavelinux, I was able to track down the problem and created this PR: #71

Unfortunately, while the PR now generates fonts ghostscript appears to like, my digging has uncovered that a bunch of the values TTFunk is writing into the maxp table are wrong. During my work on OTF support, I ran subsetted fonts through FontValidator to verify their correctness, but somehow it didn't catch these maxp problems (no idea why). Fortunately most font rendering applications don't seem to care very much, but as we saw with ghostscript that's not a guarantee. The previous release of TTFunk simply copied over the contents of the original maxp table, which is also wrong. So in a sense, we are no more wrong than we were before hehe.

I think we're going to have to parse glyph outlines to fully address the problem. As it happens I have most of the code ready to go. It's part of ttfunk-woff2. I never finished the lib (maybe someday?) but the glyph parsing code works.

I'll try to work on it in the near future. Apologies for all the trouble :(

Thanks for the update @camertron. And nice detective work! I can appreciate how much time and effort that takes. Honestly, no need to apologize. You're doing amazing work. I look forward to the update when you have the time. Let me know if you need any help testing. I'd be glad to help.

My understanding is that maxp is in a way advisory. It's purpose is to contain memory requirements for the font. I don't think any modern font library actually uses it for memory allocation. Even somewhat constrained devices (e.g. any printer from the last decade) has much more memory than any font can consume.

TTFunk always copied the values from the original font. Which could make the values invalid as in being higher than required for the subset font. This technically could've lead to higher memory use (if the font lib used those values for allocation) but not to a completely broken font.

Said all that, valid maxp is a nice thing to have but likely not worth the added complexity for the Prawn's sake.

@pointlessone hmm yeah I think I agree with you that it's likely not worth the added complexity. Should we revert those changes to maxp so it writes the original table contents as it used to?

@camertron Thank you a lot for this feature. Stellar work!

I'm closing this as DONE.