AdguardTeam/HostlistCompiler

Remove subdomains if their parent domain is already included

ameshkov opened this issue · 2 comments

@Yuki2718 commented on Sat Mar 28 2020

Related: AdguardTeam/AdguardFilters#47398 (comment)
As DNS filter uses ABP syntax, there is no point to include subdomains if their parent domains are included. So why don't you add a removal process of subdomains when you compile the list as such a little bandwidth will be saved for those user.

Basically, it means that we should extend "Compress" transformation to ABP-style lists:
https://github.com/AdguardTeam/HostlistCompiler#compress

I came here to ask the exact same thing. Obviously we can't just chop off subdomains and block the parent across the board, else ||metrics.apple.com^ now instead blocks all of ||apple.com^ which would often (usually) be undesirable, of course.

However, if it's possible to implement this in a 'smart' way that would be awesome. For example, if ||parent.com^ exists then delete the duplicate ||subdomain.parent.com^ entries. Likewise, if ||*telemetry*^ exists, delete all ||telemetry.domain.com^ entries because they're already covered... etc.

I hoped this tool would already do that, but rather it just deleted a few lines of comments from "List source 1" and tacked on "List source 2" to the end, without accounting for things as above. This left a resulting file that was 500,000 lines long when if parsed as above the list would have been at least half as small. I hope this can be figured out somehow, so please consider this a +1.