[subset] `head.indexToLocFormat=0` causes decompressing error

Question

[subset] `head.indexToLocFormat=0` causes decompressing error

Closed this issue 5 months ago · 4 comments

hb-subset Loongtype.ttf -o subset.ttf --text="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ鸿福酱酒送礼套装香气醇厚佳品自带具质保证躯" --name-IDs='*' --name-languages='*' --layout-features='*'

When the word 躯 is removed from --text, head.indexToLocFormat becomes 0, otherwise it is 1.

When subsetted fonts are compressed into woff2, this causes the woff2-rs library to report an error when decompressing.

See: woff2-rs PR and src/glyf_decoder/mod.rs:367

thread 'decode::tests::test_glyf' panicked at src/glyf_decoder/mod.rs:378:80:
called `Result::unwrap()` on an `Err` value: TryFromIntError(())
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Also, I would like to know how indexToLocFormat is calculated when subsetting?

The same thing happens with pyftsubset. cc @anthrotype

Answer 1 · 2024-01-31T09:57:49.000Z

looks like a bug in woff2-rs, nothing that harfbuzz can do. The head.indexToLocFormat specifies whether loca table uses short (Offset16) vs long (Offset32) offsets, usually one wants to choose the most compact serialization.

Answer 2 · 2024-01-31T09:59:01.000Z

I would like to know how indexToLocFormat is calculated when subsetting?

i'm not sure where exactly hb-subset computes that, but fonttools does it here:

https://github.com/fonttools/fonttools/blob/0572f7871823bdef3ceceaf41dedd0a6bd100995/Lib/fontTools/ttLib/tables/_l_o_c_a.py#L42-L49

Answer 3 · 2024-01-31T17:36:03.000Z

This is where it's decided in hb:
https://github.com/harfbuzz/harfbuzz/blob/main/src/OT/glyf/glyf.hh#L120

Answer 4 · 2024-01-31T19:35:56.000Z

Closing as this isn't a defect in harfbuzz.