support filter list pre-processor rules, eg. ifFirefox
seia-soto opened this issue · 7 comments
Containing environmental information in Filter
By including environmental information like env_<extension_name>
as a bit mask in Filter object, we can dynamically choose the filter to enable and disable.
Problem: Exceeding 32 bits to include all variables
In this case, we need to prepare to expand the current structure. According to uBlock Origin wiki, there're total of 24 preprocessor variables. However, the current structure of NetworkFilter
already uses 30 bits and CosmeticFilter
uses 8 bits.
The mask of `NetworkFilter`
/**
* Masks used to store options of network filters in a bitmask.
*/
export const enum NETWORK_FILTER_MASK {
// Request Type
fromDocument = 1 << 0,
fromFont = 1 << 1,
fromHttp = 1 << 2,
fromHttps = 1 << 3,
fromImage = 1 << 4,
fromMedia = 1 << 5,
fromObject = 1 << 6,
fromOther = 1 << 7,
fromPing = 1 << 8,
fromScript = 1 << 9,
fromStylesheet = 1 << 10,
fromSubdocument = 1 << 11,
fromWebsocket = 1 << 12, // e.g.: ws, wss
fromXmlHttpRequest = 1 << 13,
// Partiness
firstParty = 1 << 14,
thirdParty = 1 << 15,
// Options
// FREE - 1 << 16
isBadFilter = 1 << 17,
isCSP = 1 << 18,
isGenericHide = 1 << 19,
isImportant = 1 << 20,
isSpecificHide = 1 << 21,
// Kind of patterns
isFullRegex = 1 << 22,
isRegex = 1 << 23,
isUnicode = 1 << 24,
isLeftAnchor = 1 << 25,
isRightAnchor = 1 << 26,
isException = 1 << 27,
isHostnameAnchor = 1 << 28,
isRedirectRule = 1 << 29,
}
The mask of `CosmeticFilter`
/**
* Masks used to store options of cosmetic filters in a bitmask.
*/
const enum COSMETICS_MASK {
unhide = 1 << 0,
scriptInject = 1 << 1,
isUnicode = 1 << 2,
isClassSelector = 1 << 3,
isIdSelector = 1 << 4,
isHrefSelector = 1 << 5,
remove = 1 << 6,
extended = 1 << 7,
}
https://github.com/gorhill/uBlock/wiki/Static-filter-syntax#if-condition
Some variables can be merged into one variable: e.g. adguard
.
I would recommend not storing this information in the filter objects themselves but instead in some other data-structure of the FiltersEngine class or network/cosmetic buckets. One reason is what you mention about the need to add more attributes. This is going to add overhead to all filters, despite only a very small minority of them being impacted by the pre-processor rules. A second reason is more conceptual, as the pre-processor directives are not part of the filter themselves (in the lists definition) but are something outside of them to indicate which filters should be included or not depending on some external conditions. Lastly, and that's a more minor point, these directives could potentially be resolved statically at engine build-time, and in such case it would not make much sense to have extra attributes in all filters since that will be pure overhead without a function (in this sense the optional data structure stored outside of the filters make more sense to me; see below).
An alternative approach could be to have an optional set of filter IDs per environment at the FiltersEngine level, which we can then use to discard matching filters that do not belong to the current environment.
Having an optional set of filter IDs that defines what filters should be disabled looks like a nice approach for me. Also, the environmental information should be given from the external by seeing the characteristic of this project.
Not sure if preprocessor flags can be applied at the build time, for few reasons:
- The capability conditions, like
cap_html_filtering
cannot be resolved at the build time. - in future we want users to load custom list and create own filters, a preprocessor support can be useful in the runtime
- with 24 flags supported by uBO, we would have to generate 24 more engines, given we produce engines for many version of adblocker library, the effort would cost a lot. basically every flag is cost multiplier
So separate data structure may be a best compromise. We may want to reserve one bit to mark filters that have preprocessor conditions so we can skip the runtime checks (and cost) for the majority of filters.
I got an idea about this case, and I think we can implement preprocessor at both build-time and runtime.
First, we need a bit and byte to express:
- A bit if the
IFilter
(includesNetworkFilter
andCosmeticFilter
objects) object has an additional byte field to express compatibility - A byte of compatibility table (bit window)
For example of network filter:
export const enum NETWORK_FILTER_MASK {
...
// Internals
hasCompatibilityTable = 1 << 30,
}
If we find a positive bit in 1 << 30
, the deserializer of NetworkFilter
will look up next single byte to parse compatibility table.
MASK [ 1 byte ]
COMPAT_MASK [ 1 byte ] (optional, decided by 31th field of MASK)
...
Also, we'll have an option to determine if filter needs to be parsed in build-time. To minimize the impact on existing user base, I want this option to be optional.
For example, we can make a following option in Config
:
...
loadAdditionalCompatibilityTable: boolean;
If config.loadAdditionalCompatibilityTable
is set to false
, the filter parser will skip the line.
Otherwise, the filter parser will parse and save additional field in filter object: COMPAT_MASK
after MASK
.
By using this method, we only need to decide the behavior on runtime implementation because build-time won't parse the filter at all.
After cleaning up my head, I got more detailed solution to this and I'm working on this right now. First, I made PREPROCESSOR_MASK
enum type to express conditionals:
export const enum PREPROCESSOR_MASK {
isUnsupportedPlatform = 1 << 0,
isManifestV3 = 1 << 1,
isMobile = 1 << 2,
// RESERVE = 1 << 3,
// Browser specs
isBrowserChromium = 1 << 4,
isBrowserFirefox = 1 << 5,
isBrowserSafari = 1 << 6,
isBrowserOpera = 1 << 7,
// Capabilities
hasHtmlFilteringCapability = 1 << 8,
hasUserStylesheetCapability = 1 << 9,
// RESERVE = 1 << {10...12}
// Else
false = 1 << 13,
invalid = 1 << 14,
// RESERVE = 1 << 15
}
At the current timespan, I expect we'll have maximum 16 bits (uint16) of preprocessor masks. This makes having two operator possible. I'm going to allocate first 16 bits from left side to OR
operator and allocate remaining 16 bits to AND
operator.
In other words, we're going to have MUST-have bits and OPTIONAL bits.
[OR]
1 << 31
...
1 << 16
[AND]
1 << 15
...
1 << 0
Also, by looking at uBlock Origin's source code, I found that they always evaluate tokens from left to right. This means there's always evaluation priority in left side.
The following two are same:
$token $op $token $op $token
((($token) $op $token) $op $token)
So we can parse from backwards.
If the last $op
is AND
, we'll put the last $token
to AND
mask span which means the engine MUST have $token
capability to use the filter.
Otherwise, if the last $op
is OR
, we'll put the last $token
to OR
mask span which means the engine will respect this filter anyway.
The only problem in this case is having OR
in the middle + having AND
at the end of expression:
capA _AND_ capB _OR_ capC _AND_ capD
- Are
capA
andcapB
MUST in this case? We can't store this information. capC
is optional in this case.capD
is MUST in this case.
However, I expect this'll be a rare case and won't see this exception at this time.
Alternative: have an optional reference to Preprocessor
from IFilter
and evaluate the conditional at runtime
The another alternative would be having an optional reference to Preprocessor
from IFilter
.
export default interface Preprocessor {
id: number;
condition: number[]; // masks
operators: boolean[]; // corresponding operator per condition (mask)
}
import { StaticDataView } from '../data-view';
import Preprocessor from '../somewhere';
export default interface IFilter {
mask: number;
preprocessorRef?: Preprocessor['id'];
getPreprocessor: () => Preprocessor | undefined; // let me assume this feature is opt-in
getId: () => number;
getTokens: () => Uint32Array[];
serialize: (buffer: StaticDataView) => void;
getSerializedSize(compression: boolean): number;
}
The Preprocessor
reference from IFilter
will be created in build-time optionally.
The advantage of this solution over upper solution is that we won't need any extra fields depending on the size of conditions and operators in Preprocessor
, but just an optional byte.
Also, there's no problem in serialization in Preprocessor
.
Update 1: I made a minimal changes to filter parser to show how the last alternative will work: https://github.com/ghostery/adblocker/compare/master...seia-soto:adblocker:add-preprocessor?expand=1
The core changes are in lists.ts
and preprocessor.ts
.