collinbarrett/FilterLists

My New Filterlist

thedoggybrad opened this issue ยท 10 comments

I have a few questions (not affiliated with this project, just curious).
Where do you get the entries for this list? The README mentions Phishing Domain Database and The Big List of Hacked Malware Web Sites
Also, there are a lot of duplicated entries:
image
Thanks!

@iam-py-test , what tool did you use to find those duplicates? Just curious.

Visible in uBO

image

iam-py-test , what tool did you use to find those duplicates? Just curious.

I used https://abpvn.com/ruleChecker/redundantRuleChecker.html (DandelionSprout recommends it in the adfilt README, that's how I found it), but @gwarser's method works too (though this shows the specific redundant rules).
I am working on a PR to remove some of the redundant rules, but there are too many to do by hand and my Python script keeps wanting to change the line endings from CRLF to LF, which makes the diff show I changed every single line.

I recently had to deal with this issue on my own blocklist. Here is a snippet of code in Bash to find redundant entries:

while read -r entry; do
    grep "\.${entry#||}$" adblock.txt >> redundant_entries.txt
done < adblock.txt

# The output has a high chance of having duplicates
sort -u redundant_entries.txt -o redundant_entries.txt

This assume your list only has entries in the form of ||example.com^. The code loops through each entry and converts it into a pattern to be matched by grep. grep looks for other entries that are subdomains (of any level) of the current entry. The whole process takes quite long (takes about 45 seconds for my 2300 rule ABP list).

I'm going to feed the redundant entries file into my list building script so it ignores the entries in the file.

I will try to fix those duplicates. I have not checked for it. Let me fix it.

I have a few questions (not affiliated with this project, just curious). Where do you get the entries for this list? The README mentions Phishing Domain Database and The Big List of Hacked Malware Web Sites Also, there are a lot of duplicated entries: image Thanks!

What you have said is right. Just compiled them.

Also, one small comment on the README. IMO "uBlock" is garbage and shouldn't be recommended as an option to use this list with; it was unmaintained for years and then recently removed it's code from GitHub and started pushing updates again. The developer(s) have done shady stuff in the past (tracking users, stealing code), and doesn't even have a functional options page, so it's not even possible to install any non-default lists in it:
image
It's also blocked as malicious by several blocklists, including uBo's default badware risks.

@iam-py-test Thanks for that, removing it ASAP on my readme of all my filterlists
(Update: Sucessfully removed on the readmes of all my filterlists.)

By the way, the duplication of filters are fixed.

@iam-py-test
Thanks for making me aware of what is happening on uBlock now. Before it was almost looking like the same as uBlock Origin.
What I know is that uBlock is the original one but due to conflicts between 2 repository owners the original owner maked uBlock Origin. Before, I have read some recommendations on uBlock Origin's filterlist (issues on repository) itself suggesting not to use uBlock. Now, the Github code for uBlock has been removed, I was surprised to know that and immediately looked for it myself. I am not actually a fan of uBlock either.

By the way, I am using uBlock Origin on my web browsers. So I am definetly not testing my filterlists on other adblocks.