iam-py-test/my_filters_001

[bug] Pihole reports some entries are not domains

Closed this issue · 25 comments

What extention/adblocker/firewall do you have?

Pihole

What list(s) are you using?

The malicious website blocklist

What is the issue?

Pihole reports this when using the list:

[i] Target: https://raw.githubusercontent.com/iam-py-test/my_filters_001/main/Alternative%20list%20formats/antimalware_abp.txt
[✓] Status: Retrieval successful
[✓] Parsed 0 exact domains and 9784 ABP-style domains (ignored 509 non-domain entries)
Sample of non-domain entries:
- "||infinitypr.in^$3p"
- "||www.robint.us^$3p"
- "||154.35.32.5^"
- "||37.0.11.8^"
- "||49.143.32.6^"

509 non domain entries exist. Is this a bug or is this correct (a feature)?

Kind regards
Peter

Hi, @pallebone! Thank you for opening an issue!
@iam-py-test will get to this as soon as possible

The ABP version is designed for AdBlock Plus, which supports $ values (i.e. 3p) and IP addresses. PiHole seems to not know what to do with them, thus the error.
There is a pure domains version I originally intended for PiHole (and other similar software), but given PiHole supports ABP/uBo syntax minus modifiers (the $ stuff), I could create an ABP version for it which has $'s and IP's stripped out.
All of this is complicated by the fact that I lack an install of PiHole.
I am going to do a little research to confirm my hypothesis, and work on creating a PiHole-specific ABP-style list (which should be easy once I know how to do it properly).

Interesting, thank you for this information. I did not know ABP supports IP's. That is interesting information.

Kind regards
Peter

Well, it doesn't support IPs, it just treats them as domains. But PiHole clearly can't do that.
Thank you for reporting this.

Interesting, what is the use of treating an IP like it was a DNS entry? I find this a confusing feature. If no DNS lookup is made, what is the point?

I'm sorry, but I'm confused. If you are talking about ABP, it filters at the browser level not DNS.

I see what you mean. Makes sense.

Can you delete the old list and try https://raw.githubusercontent.com/iam-py-test/my_filters_001/main/Alternative%20list%20formats/antimalware_abp_domainsonly.txt? Please tell me if there are any errors so I can investigate further. Thanks

Thank you for this new list. It is appreciated.
There are still some errors... can you check?

[i] Target: https://raw.githubusercontent.com/iam-py-test/my_filters_001/main/Alternative%20list%20formats/antimalware_abp_domainsonly.txt
[✓] Status: Retrieval successful
[✓] Parsed 0 exact domains and 8950 ABP-style domains (ignored 6 non-domain entries)
Sample of non-domain entries:
- "||demoniacipő.com^"
- "||demoniatürkiye.com^"
- "||sauconyløbesko.com^"
- "||keentürkiye.com^"
- "||nobulltürkiye.com^"

Kind regards
Peter

Thanks. That's an easy fix, luckily.
Edit: Turns out fixing this wasn't as easy as it looked. Stand by, I'm investigating further

Thank you for investing the time to investigate.

Sadly, the library I use (idna) can't handle encoding these domains, so my code "fails softly" and just returns the encoded versions.
I could:

  • remove the domains, though that doesn't fix anything, so this will come back up again someday
  • just exclude domains which cause an error (I think I will do this for now, but I don't think this is a permanent solution)
  • find another library which can handle them (I would prefer to do this, but it will require some time/effort on my part to find & test various libraries)

Are these real domains? Ie is keentürkiye.com a real domain and Pihole is at fault (ie log a bug with pihole) or is the domain fake and thus invalid (Pihole is correct)?

Kind regards
Peter

I'm not sure about those specific domains (they don't resolve), but there are valid domains with those characters in them.
As to if PiHole is working as it shouldn't, I'm afraid I lack the knowledge to answer that.
I think I have found a solution, but I'm not a 100% comfortable with it

Can you update the list and check if there are any errors?
Thanks

There was an improvement so I think you fixed something but some other domain got flagged ip for a different reason I think:

[i] Target: https://raw.githubusercontent.com/iam-py-test/my_filters_001/main/Alternative%20list%20formats/antimalware_abp_domainsonly.txt
[✓] Status: Retrieval successful [✓] Parsed 0 exact domains and 8957 ABP-style domains (ignored 2 non-domain entries)
Sample of non-domain entries:
- "besko-xhb593g.com^"
- "vlerhunter-7sb517i.com^"

I think they are supposed to have ||

Those lines are complete filters, they just have spaces in them. It seems PiHole is ignoring everything before the space, and thus making the entry invalid. I will have to get PiHole setup and test. Thanks

Well, this seems to be an issue with the function I'm using to encode these; it adds spaces into the domains, and spaces aren't valid in domains. 🤦‍♀️ Well, back to square one. It's too late today for me to look further into this, so just going to have the script ignore domains it can't encode until I find a better solution.

Thats weird there were only 2 causing an issue. It looked like you almost fixed it.

Sometimes its good just to have a rest ❤️
Have a good sleep :)

Im closing this issue, apologies for wasting your time. You can see my comment on the other ticket.

Kind regards
Peter

Sorry just want to ask, will I still be able to use this list:
https://raw.githubusercontent.com/iam-py-test/my_filters_001/main/Alternative%20list%20formats/antimalware_abp_domainsonly.txt

Or do you not intend to keep it?

Kind regards
Peter

I intend to keep it. It still has value to PiHole users (other than those few domains I couldn't get PiHole to be happy with, the rest of the list is still there), and there's pretty much 0 cost in terms of maintenance as it's just a script which automatically runs.

Ok that is great, the list is currently working without any errors:

[i] Target: https://raw.githubusercontent.com/iam-py-test/my_filters_001/main/Alternative%20list%20formats/antimalware_abp_domainsonly.txt
[✓] Status: Retrieval successful
[✓] Parsed 0 exact domains and 8962 ABP-style domains (ignored 0 non-domain entries)
[i] List stayed unchanged

Thank you.

P