Kozea/tinycss2

Processing of unicode-range working incorrectly

Closed this issue · 3 comments

Attempting to parse the following file: https://fonts.googleapis.com/css2?family=Lato:ital,wght@0,100;0,300;0,400;0,700;0,900;1,100;1,300;1,400;1,700;1,900&display=swap

Printing the tokens for the first font-family gives me the below output, which appears to show two WhitespaceTokens in place of the UnicodeRange token for unicode-range: U+0100-024F, U+0259, U+1E00-1EFF, U+2020, U+20A0-20AB, U+20AD-20CF, U+2113, U+2C60-2C7F, U+A720-A7FF;

<WhitespaceToken>
<IdentToken font-family>
<LiteralToken :>
<WhitespaceToken>
<StringToken "Lato">
<LiteralToken ;>
<WhitespaceToken>
<IdentToken font-style>
<LiteralToken :>
<WhitespaceToken>
<IdentToken italic>
<LiteralToken ;>
<WhitespaceToken>
<IdentToken font-weight>
<LiteralToken :>
<WhitespaceToken>
<NumberToken 100>
<LiteralToken ;>
<WhitespaceToken>
<IdentToken font-display>
<LiteralToken :>
<WhitespaceToken>
<IdentToken swap>
<LiteralToken ;>
<WhitespaceToken>
<IdentToken src>
<LiteralToken :>
<WhitespaceToken>
<URLToken url(https://fonts.gstatic.com/s/lato/v17/S6u-w4BMUTPHjxsIPy-sNiXg7Q.woff)>
<WhitespaceToken>
<FunctionBlock format( … )>
<LiteralToken ;>
<WhitespaceToken>
<WhitespaceToken>

Looks like it's only set up for parsing a single range in the abstract syntax tree:
ast.py

liZe commented

Hi!

Looks like it works for me:

<WhitespaceToken>
<IdentToken font-family>
<LiteralToken :>
<WhitespaceToken>
<StringToken "Lato">
<LiteralToken ;>
<WhitespaceToken>
<IdentToken font-style>
<LiteralToken :>
<WhitespaceToken>
<IdentToken italic>
<LiteralToken ;>
<WhitespaceToken>
<IdentToken font-weight>
<LiteralToken :>
<WhitespaceToken>
<NumberToken 100>
<LiteralToken ;>
<WhitespaceToken>
<IdentToken font-display>
<LiteralToken :>
<WhitespaceToken>
<IdentToken swap>
<LiteralToken ;>
<WhitespaceToken>
<IdentToken src>
<LiteralToken :>
<WhitespaceToken>
<URLToken url(https://fonts.gstatic.com/s/lato/v17/S6u-w4BMUTPHjxsIPx-mPCLQ7A.woff2)>
<WhitespaceToken>
<FunctionBlock format( … )>
<LiteralToken ;>
<WhitespaceToken>
<IdentToken unicode-range>
<LiteralToken :>
<WhitespaceToken>
<UnicodeRangeToken 256 591>
<LiteralToken ,>
<WhitespaceToken>
<UnicodeRangeToken 601 601>
<LiteralToken ,>
<WhitespaceToken>
<UnicodeRangeToken 7680 7935>
<LiteralToken ,>
<WhitespaceToken>
<UnicodeRangeToken 8224 8224>
<LiteralToken ,>
<WhitespaceToken>
<UnicodeRangeToken 8352 8363>
<LiteralToken ,>
<WhitespaceToken>
<UnicodeRangeToken 8365 8399>
<LiteralToken ,>
<WhitespaceToken>
<UnicodeRangeToken 8467 8467>
<LiteralToken ,>
<WhitespaceToken>
<UnicodeRangeToken 11360 11391>
<LiteralToken ,>
<WhitespaceToken>
<UnicodeRangeToken 42784 43007>
<LiteralToken ;>
<WhitespaceToken>

Your problem probably comes from the way Google Fonts provides its stylesheets: they’re not the same depending on the HTTP client. The unicode-range property is there when you use Google Chrome, but it’s not there when you use Curl or Requests for example.

Well that's extremely embarrassing - I was telling google I was using Chrome 35 so it gave me appropriate CSS 🤦
Thank you very much!!