Font Anti-Fingerprinting
This is an early draft of a proposal to eliminate font-based fingerprinting on the web. You're welcome to contribute!
- Introduction
- Terminology
- Goals
- Non-goals
- Proposal
- Key scenarios
- Detailed design discussion
- Considered alternatives
- Metrics to justify shipping
- Stakeholder Feedback / Support / Opposition
- References & acknowledgements
Introduction
Differences in the set of fonts users have installed locally contribute many bits to a unique fingerprint of that user. TODO: link to some quantification.
Terminology
Locale
Here a "locale" is the tag negotiated by
the Accept-Language
header. This is often one language+country pair of the form
"en-GB" but other variations are common, as described in RFC 5646:
- "es-419" represents Spanish ('es') appropriate to the UN-defined Latin America and Caribbean region ('419').
- "sr-Latn-RS" represents Serbian ('sr') written using Latin script ('Latn') as used in Serbia ('RS').
- "zh-Hans-CN" represents Chinese ('zh') written in the Simplified script ('Hans') in China ('CN').
The locale is not the whole Accept-Language
list.
Goals
Make font-based queries useless for distinguishing any two users running:
- the same major version of the same browser
- on the same version of the same operating system
- in the same locale.
Non-goals
Attempts to reduce identifiability based on locale, browser, or OS choice are out of scope for this explainer. These are plausibly permanent bits in an active fingerprint because:
- The site needs the locale to show users content in a language they understand.
- The browser and its version are exposed by
Sec-CH-UA
. - The OS and its version are exposed by the opt-in
Sec-CH-UA-Platform
.
This proposal also does not attempt to prevent sites from distinguishing users who have customized their generic font families to use non-default fonts.
Proposal
Each browser will be able to map from a locale (as in Goals, just the
negotiated language+country pair, not the whole Accept-Language
list) to:
Allowed system fonts
This resembles Safari's list of fonts usable from CSS, which it receives from an
OS API. Ideally each operating system should provide a system API to determine
whether a given font is pre-installed or user-installed. Browsers will only
allow use of pre-installed fonts in places like the @font-face
src:
local()
function
and the font-family
property.
Aggressively-cached web fonts
The set of most-commonly-used web fonts for each locale will be derived from metrics gathered from browser telemetry. We should share a single list across all browsers, and publicize this list so developers can rely on it. TODO: Figure out how much usage makes a font one of the most-commonly-used fonts. Is that a number or disk-size of fonts, a percentage of page loads, or what?
The first time a user visits a page that uses one of these fonts, it's downloaded and cached until it's no longer in the set of commonly-used fonts, which could be forever. See When to cache the webfonts.
Key scenarios
TODO: look through discussion threads to check that this solves the objections.
Detailed design discussion
When to cache the webfonts
It would be safest to pre-cache all of these fonts when a new major version of a browser is installed, but this might waste valuable bandwidth and disk space for a font that a particular user never happens to need.
I believe it's also safe to cache each font at the point where it's first used, as long as the cache never evicts fonts. This allows exactly one site to determine that a user has not visited any site that either uses the font or has tried to learn this fact about the user.
If the user removes a locale from their Accept-Language
list, it's plausible to
evict fonts that aren't common for their new set of locales. If the user then
re-adds that locale, it gives one site another chance to learn something about
that user, but changing the Accept-Language list is rare enough that this seems
acceptable.
How to support dynamic font subsetting?
https://blog.typekit.com/2015/06/15/announcing-east-asian-web-font-support/ describes a piece of Javascript that dynamically requests the particular characters within a font that a particular page actually uses. This presents some difficulties for the approach here, depending on how exactly the Javascript works:
- If Adobe has defined a separate URL for each character's font subset, and the Javascript requests many of those to cover the characters on a page, the set of most-commonly-used web fonts may be large, as it will consist of the set of all popular characters from all popular fonts.
- If Adobe has defined a query parameter or similar that specifies the set of characters to include, there will probably be too many subsets for any of them to appear commonly-used. These fonts will have to be re-downloaded for each top-level site.
There may be space for a new CSS specification to help browsers optimize this better.
Considered alternatives
Just define one list of fonts, not depending on locale
This is likely to break the experience for users who read a minority language and have expensive mobile data. They'll no longer be able to pre-cache the fonts they need locally, and double-keyed web storage will prevent them from even keeping a cache of web fonts across multiple sites.
Define a set of local fonts
If we pick a set of commonly-used local fonts for each locale now, I believe we'll have a hard time updating that set as new fonts are developed. By continually collecting metrics on popular web fonts, we'll naturally notice if developers like a new font enough.
This suggestion to aggressively cache a widely-used subresource has come up for Javascript libraries too, with the objection that it advantages the already-winning frameworks and makes it hard to evolve the web. The same objection is valid for fonts, but it seems less important to encourage font evolution.
Metrics to justify shipping
-
Within each locale, <= 0.???% of page loads will be "broken" by the new restrictions. This criterion is meant to ensure we don't disenfrancise minority languages or scripts.
-
<= 0.0???% of overall page loads will be "broken" by the new restrictions.
We consider a page load "broken" if it uses a different font before and after the new restrictions, and the "after" font was selected by a generic-family name or is the last-resort font.
-
A user who fetches all popular fonts for their locale uses no more than ??MB of disk space.
-
TODO: Something about how much extra network transfer a ??%ile user uses after the new restrictions.
Stakeholder Feedback / Support / Opposition
- CSSWG: No signals
- Browsers:
- Chrome: Positive
- Edge: No signals
- Firefox: No signals
- Opera: No signals
- Safari: Support limiting font variation to be based on (browser,OS,locale). Concern about aggressively caching web fonts, especially if it's done before the user actually needs the font.
- Samsung: No signals
- UC: No signals
- Web developers: No signals
References & acknowledgements
Many thanks for valuable feedback and advice from:
- Pete Snyder
- Safari for showing that a fixed local font list is web-compatible
- Tab Atkins
- The TAG for writing https://w3ctag.github.io/explainers
- Other CSSWG folks on various bug threads