yubiuser/pihole_adlist_tool

Possibility to Deactivate redundant Blocklists

HomemadeAdvanced opened this issue · 7 comments

Hello,

is there a possibility to show and/or deaktivate blocklists that are redundant, so that the unique domains on the first page does not change and only the unnecessary blocklists with their domains are deactivated. Thanks in advance.

I'm not sure I understand what you want to archieve: You can select
"Enable the minimal number of adlists that cover all domains that would have been blocked"

which will enable only the set with the least adlists necessary to block everything that would have been blocked. If you enable "Enable only adlists with covered unique domains" you might miss some domains that are not unique (e.g. contained in onĺy two adlists)

Currently I have many domains covered by many lists. The total number of blocked domains is twice the number of unique domains. These redundant domains should be reduced, but without the feature of including the visited domains of the last time.

So what you want is:

"If a adlist contains only domains that are also part of other adlist, deactivate this adlist"? And check this for all adlist at the same time so the maximum number of adlist can be disabled?

(This would all be independent of your browsing behavior. )


This is not possible with the current tool. It is designed to be based on your browsing habits, not focusing on the adlists alone. But I do see some value in your idea. I'll think about it.

That's exactly what I would like to use. It would be great if it could be implemented.

I think this also can never be made possible because adlists may change independently, so list1 may add/remove domains that are not/still in list2. Pi-hole will already make all domains "unique":

[i] Number of gravity domains: 4514301 (4038026 unique domains)
[i] Number of exact blacklisted domains: 24
[i] Number of regex blacklist filters: 25
[i] Number of exact whitelisted domains: 13
[i] Number of regex whitelist filters: 4

grafik

But… wait… if you're interested in it this might be a brute-force solution by comparing all lists with each other lists. So be careful when you have "many" adlists because it will run "no-of-adlists minus 1 x no-adlists devided by 2" times:

for d1 in *.domains; do
  for d2 in *.domains; do
    [ "$d1" = "$d2" ] && break
    echo $d1 vs. $d2
    comm -3 <(sort $d1) <(sort $d2) | wc -l
  done
done

Each result with "0" shows you two lists which contents are 100% the same.

For testing purpose I copied one adlist to proof that a "100% the same" duplicate will be found by this:

cp list.1.raw.githubusercontent.com.domains zzz.list.1.raw.githubusercontent.com.domains

❓ Does this help you and can this issue be closed?

See man page:

-3 suppress column 3 (lines that appear in both files)

Thanks. I will have a look into it when I have the time. This seems like a valid solution. With the lists from https://codeberg.org/HomemadeAdvanced/PiHole/src/branch/main/PiHoleAdlistsGermany.txt I have 27734794 with 17329998 unique domains.