j-moriarti/pDNSf-Hosts-collection

Optimize/compress the output files.

badmojr opened this issue ยท 4 comments

Hello!
Just wanted to bring this to your attention as (DNS) filterlist maintainers are starting to implement this.
eg.
hectorm/hblock@21b4c3d
mkb2091/blockconvert@498dc47#diff-598405dde0944f981bc7fa79065f7006865f1169c1c2766ae6ab620e1136df9aR351

I've tested the current list (2021-03-14) and the domains count went down significantly when redundant subdomains (1,465,313) got removed.
Cheers!

Hi!
Thanks for informing me about this! Actually, I had already tried to do something similar in the past, but my code was not optimized and took around 30-60 minutes to remove subdomains of a 60MB sized block-list!
But with the info you mentioned, I re-implemented it here (73d3874) and now it takes only a few seconds!!! thanks again for your help! ๐Ÿ‘

It will be great to inform me about any issues or this kind of enhancements! I will be more than happy to hear and fix it ๐Ÿ˜‰ ๐Ÿ˜„

Quite a lot of redundunt subdomains have been removed.
There is quite alot, though, that the scprit missed.
eg. subdomains for taboola.com

eg. subdomains for taboola.com

Oh, Nice catch!
seems there was an issue with prioritizing the (-) character over dot(.) while sorting list.
it should be fixed by 0d0595a (Hopefully)
Thanks again for helping and reporting this! ๐Ÿ‘

No more redundant subdomains. ๐Ÿ‘
Keep up the good work!