papandreou/subset-font

Options to configure what to retain in the name table of a font?

bjrn opened this issue · 8 comments

bjrn commented

Hi, this might be more related to harfbuzzjs, but since the options would likely need to be set in subset-font I reckoned I start here … I'm looking for a way to retain the Font License in the subsetted woff2 files.

Glyphhanger which uses pyftsubset under the hood, has a config section regarding what to retain in the name table of a subsetted font: (repo)

...

Font naming options:
  These options control what is retained in the 'name' table. For numerical
  codes, see: http://www.microsoft.com/typography/otspec/name.htm
  --name-IDs[+|-]=<nameID>[,<nameID>...]
      Specify (=), add to (+=) or exclude from (-=) the set of 'name' table
      entry nameIDs that will be preserved. By default only nameID 1 (Family)
      and nameID 2 (Style) are preserved. Use '*' to keep all entries.
      Examples:
        --name-IDs+=0,4,6
            * Also keep Copyright, Full name and PostScript name entry.
        --name-IDs=''
            * Drop all 'name' table entries.
        --name-IDs='*'
            * keep all 'name' table entries

My use case, retaining the font license information, would mean adding a --name-IDs=13,14 flag according to the docs.

ID Description
13 License Description; description of how the font may be legally used, or different example scenarios for licensed use. This field should be written in plain language, not legalese.
14 License Info URL; URL where additional licensing information can be found.

I verified that parts of the name table, like "Copyright", is retained by dropping the file generated by subset-font onto wakamaifondue. The license info is not retained, and as far as I can tell, it's not enabled by default in pyftsubset/glyphhanger either – one has to specify the flag.

Totally makes sense to expose options for this here, but yeah, someone will need to dive into harfbuzz(js) to figure out how to do it 😅 . Happy to help with whatever I can, but I've always had to seek the assistance of @ebraminio when diving down there.

The related API is exposed in harfbuzzjs's subset build, hb_subset_input_nameid_set, hopefully works if isn't disabled by HB_TINY which if so HB_NO_NAME should be put in https://github.com/harfbuzz/harfbuzzjs/blob/main/subset/config-override.h and I should make a new release for it, @papandreou please have a look at these and confirm that, thanks! :)

Amazing! @bjrn, if you have a working wasm build tool chain you can maybe play around with it? 🤗

Otherwise I can try to help later.

I've taken a quick look at it, and being a good boy I'm starting by trying to write some tests. So the first order of business is to find a module or tool that can list the name ids that are present in a font represented by a buffer.

Looks like fontkit doesn't expose an easy way to get the parsed name table.

There's good old ttx:

$ ttx -t name -o - testdata/OpenSans.ttf 
Dumping "testdata/OpenSans.ttf" to "-"...
Dumping 'name' table...
<?xml version="1.0" encoding="UTF-8"?>
<ttFont sfntVersion="\x00\x01\x00\x00" ttLibVersion="4.24">
  <name>
    <namerecord nameID="0" platformID="3" platEncID="1" langID="0x409">
      Digitized data copyright © 2010-2011, Google Corporation.
    </namerecord>
    <namerecord nameID="1" platformID="3" platEncID="1" langID="0x409">
      Open Sans
    </namerecord>
    <namerecord nameID="2" platformID="3" platEncID="1" langID="0x409">
      Regular
    </namerecord>
    <namerecord nameID="3" platformID="3" platEncID="1" langID="0x409">
      1.10;1ASC;OpenSans-Regular
    </namerecord>
    <namerecord nameID="4" platformID="3" platEncID="1" langID="0x409">
      Open Sans Regular
    </namerecord>
    <namerecord nameID="5" platformID="3" platEncID="1" langID="0x409">
      Version 1.10
    </namerecord>
    <namerecord nameID="6" platformID="3" platEncID="1" langID="0x409">
      OpenSans-Regular
    </namerecord>
    <namerecord nameID="14" platformID="3" platEncID="1" langID="0x409">
      http://www.apache.org/licenses/LICENSE-2.0
    </namerecord>
  </name>
</ttFont>

... but it'd be kinda annoying to have a dev dependency on Python and fonttools just to gain access to that tool in the test suite 😕

I guess we can't use harfbuzz itself for this?

It does seem to work fine, though: https://github.com/papandreou/subset-font/compare/feature/preserveNameId

@ebraminio, all good, hb_subset_input_nameid_set is present in the latest released harfbuzzjs 😌

bjrn commented

wow, nice work! Agree about having to add a dev dependency on Python, and did a little digging:
wakamaifondue is using LibFont to get metadata out of the font:
nameTable parsing and the variables that maps name table ID to readable names

It also appears as if FontKit has support for more names than it is exposing: code for name table extraction

I did a quick check to see if it was possible to get the meta data from FontKit somehow, and it seems like this could work:

const font = fontkit.openSync(file);
console.log(font.name.records);

FontKit exposes some default properties like font.fullNume, but it appears as if font.name.records exposes the full set (although each record returns an object with language as key).

Running it on a a .ttf version of Noto Sans Bold yields the following:

{
  copyright: { en: 'Copyright 2012 Google Inc. All Rights Reserved.' },
  fontFamily: { en: 'Noto Sans' },
  fontSubfamily: { en: 'Bold' },
  uniqueSubfamily: { en: 'Monotype Imaging - Noto Sans Bold' },
  fullName: { en: 'Noto Sans Bold' },
  version: { en: 'Version 1.04' },
  postscriptName: { en: 'NotoSans-Bold' },
  trademark: {
    en: 'Noto is a trademark of Google Inc. and may be registered in certain jurisdictions.'
  },
  manufacturer: { en: 'Monotype Imaging Inc.' },
  designer: { en: 'Monotype Design team' },
  description: { en: 'Designed by Monotype design team' },
  vendorURL: { en: 'http://code.google.com/p/noto/' },
  designerURL: {
    en: 'http://www.monotypeimaging.com/ProductsServices/TypeDesignerShowcase'
  },
  license: { en: 'Licensed under the Apache License, Version 2.0' },
  licenseURL: { en: 'http://www.apache.org/licenses/LICENSE-2.0' }
}

I also verified that FontKit font.name.records works with a .woff2 version that I subsetted with pyftsubset, including the --name-IDs=13,14 flag.

So I guess this would mean that FontKit could be used in the tests instead of ttx — I won't have time during the day, but if you want I can take a stab at testing it on your branch tonight?

Ah, nice, I fiddled around with fontkit, but didn't realize that it could be used like that. Updated the branch now.

Released the support for preserveNameIds in 1.3.0 just now. We can add more detailed controls later if need be :)