Why use UTF16ToUTF8() ?
git-host-admin opened this issue · 3 comments
Hi:
I'm from china, and there many chinese fonts. When i use getFontName() or other function like this, the return value is not valid, but if i remove the UTF16ToUTF8() call, it's the thing we want.
Library assumes all encoding are UTF16 by default without taking in consideration the PlatformID provided
I have overcome this issue by subclassing the name table class and overriding _parse() function
$font = \FontLib\Font::load($path);
//Replace old table
$tables = $font->getTable();
$table = new \Additions\Table\Type\nameEncoding($tables['name']);
$table->parse();
$font->setTableObject('name', $table);
$font->getFontPostscriptName();
class nameEncoding extends name {
private static $header_format = array(
"format" => self::uint16,
"count" => self::uint16,
"stringOffset" => self::uint16,
);
protected function _parse() {
// override here
}
}
The global conversion from UTF16 was, mostly, according the spec.
Relating to platform ID 0 (Unicode):
All Unicode-based names must be in UTF-16BE (big-endian, two-byte encoding). UTF-8 and UTF-32 (one- and four-byte encodings) are not allowed.
Relating to platform ID 3 (Windows):
Encoding IDs for platform 3 'name' entries should match the encoding IDs used for platform 3 subtables in the 'cmap' table. When building a Unicode font for Windows, the platform ID should be 3 and the encoding ID should be 1. When building a symbol font for Windows, the platform ID should be 3 and the encoding ID should be 0. All string data for platform 3 must be encoded in UTF-16BE.
However, it is also true that other encodings may be used (as seen in the supplied font). While I haven't completely addressed the underlying deficiency in how the library handles string encoding, the changes implemented for the next release should be sufficient for most cases. Expanded encoding support will be built out as needed based on user feedback.
I noticed that the sample font provided uses cmap subtable format 2, which isn't yet supported. I added support for that format and improved encoding support in other areas of the library so that the next release will correctly re-encode this font.
The re-encoded font now loads correctly in browsers that do not load the original font due to spec compliance issues.