duckduckgo/tracker-radar

hosts file with all domains included

beerisgood opened this issue Β· 31 comments

It is possible that you can create a hosts file with include all domains?
So we can use that with eg. PiHole

Would be awesome!

It is possible that you can create a hosts file with include all domains?

What would be the impact on system performance?

It is possible that you can create a hosts file with include all domains?

What would be the impact on system performance?

On PiHole? None.
My old Raspberry Pi 2b block 1,8 million domains and CPU is at 0,x % with memory on ~27%

My thoughts exactlyβ€”this would be awesome for pi-hole users!

My old Raspberry Pi 2b block 1,8 million domains and CPU is at 0,x % with memory on ~27%

Have you measured kernel overhead?
Have you done some stress tests to confirm what you say?

Have you measured kernel overhead?

Don't know what you mean

Have you done some stress tests to confirm what you say?

Sure. I use PiHole now many years.

I wouldn't include it here, but anyone is welcome to use this data to build their own hosts file.
It might require some testing and consideration for what you want to include. My guess is taking all 5k+ domains and turning them into a hosts file would cause some breakage.

Hi fellow Pi-Hole users, I also wanted a hostlist for Pi-Hole, so I have just generated this list:
https://gitlab.com/michelt/ddg-tracker-radar-hostfile/-/raw/master/hostlist.txt
I used the Domains folder to get all the domains, I assumed that the domains in the Entities folder were also in the Domains folder. Am I correct @jdorweiler ?

That works, but watch out for some of the domain categories. Including everything might cause a lot of breakage https://github.com/duckduckgo/tracker-radar/blob/master/domains/cloudflare.com.json#L5998

Hm, you are right, my list is a little short sighted. I think I will filter all the domains used for CDN's out of the list by default and make an separate list with all domains (including CDN's), for the folkes who would rather whitelist than blacklist.

Going to have to experiment with this so ads/trackers can be added to the HOSTS generated at https://github.com/StevenBlack/hosts as well as a LittleSnitch subscription set.

I would love to see the list done by DuckDuckGo directly for my pi-hole as well. I'm sure at this point I might have most of them. @Michelenzoo I looked through your list and would love to see it sorted alphabetically.

@turtle2472 Good one. They are now sorted.

sebrk commented

https://gitlab.com/michelt/ddg-tracker-radar-hostfile/-/raw/master/hostlist.txt seems to block stuff it shouldn't. Ironically it blocked duckduckgo.com for me and even logging into GitHub (github.com/login).

Going to have to experiment with this so ads/trackers can be added to the HOSTS generated at https://github.com/StevenBlack/hosts as well as a LittleSnitch subscription set.

What has Stevenblack's list to do with your own?

https://gitlab.com/michelt/ddg-tracker-radar-hostfile/-/raw/master/hostlist.txt seems to block stuff it shouldn't. Ironically it blocked duckduckgo.com for me and even logging into GitHub (github.com/login).

I see, so does that imply that DuckDuckGo tracks us? :)
BTW, it is on the list because my generator script is pretty dumb. It iterates over all the files in the domains folder and adds it to the hostlist.txt if either categories is empty or does not contain the words 'cdn' or 'online payment'.
As you can see here, the DuckDuckGo file has no categories. This results in the script just adding the domain to the list.

sebrk commented

Yes, someone (including myself) should have a proper look at the data and create a structured filter.

Sadly even the non-full list block invidio.us
invidio.us is a YouTube frontend with better privacy. Don't know why this get blocked

Going to have to experiment with this so ads/trackers can be added to the HOSTS generated at https://github.com/StevenBlack/hosts as well as a LittleSnitch subscription set.

What has Stevenblack's list to do with your own?

Because I use his as a base for my own and would contribute this back into his for the greater good of others who use his.

rd-su commented

Also add support for uBlock Origin, and others content blockers.

Using the list to block third-parties...

Did someone of you looked into the files? It would be better to create a script, which only take the Urls, which contain trackers or something else. Maybe I will look deeper into it, but it could take a moment or two, because json isn't the best friend of mine, but I think, I've an idea.

Bullshit detected:
More problematic would be the regexpression, which was used in the files, because they aren't conform to any ad-/dns-blocker, if I see it correct.

Edit: Looked a little bit around. I misunderstood some of you about the iterating and the files and thought, you only take the filenames.

It should be possible to do this, with the break conditions and some manual editing, after testing, but huge, automatic generated lists like this one, will never be 100% perfect and anyone can get false positive, which can't be sorted out.

If someone likes help, it would be a pleasure to help and learn some new stuff.

There's some good discussion on this in the pihole subreddit. https://old.reddit.com/r/pihole/comments/fdws51/duckduckgo_tracker_radar/fjkkzjq/

@jdorweiler why closing?

It's better to handle this in the specific client repos that would use a hosts list.

TPS commented

@jdorweiler So, just to clarify, y'all don't want to publish any kind of app-ready final product (even just plain text or hosts list), but are providing this repo solely so that developers can format the data themselves & include into their own apps?

@TPS that's right.

Yikes. So instead of making something useful (a la Let’s Encrypt), this is just an academic project? A hosts file would be extremely pragmatic (and probably not much work for you people to put together for the community to get a whoooooole lot of goodwill).

Following on that commentβ€”what is the point of publishing this project?? Do you want to make some difference, or do you just want the techmeme referrals?

I think what they provided is more than enough as they didn't have to do it to begin with. They are providing this as a courtesy of what they found. It's up to others to figure out what to do with it.

Providing a hosts file for something that is unknown is not wise and can create issues for others.

What an awful apologist comment.

@thefaj Your comment is completely unrespectful and a shame for anyone, who worked on the project and did a good job. @jdorweiler provided a link to a reddit thread, where they was talking about the problematic, why it's not useful to put this domains in a simple hosts file. It's not possible in any way, to provide a clean hosts list for pi-hole or something else, because you've to block specific parts of a site and not the site/domain by itself. The data could be used, to create a blocklist for uBlock Origin or similar adblockers, but you've to put a heck of time and effort, to test this, because you've to create costume scripts, to crawl and get the correct data on the correct format and test like an idiot, to not break the internet. Google, Facebook, Microsoft and even duckduckgo itself, would be blocked completely, if you took the data by itself, like you like it.

Please, be more respectful, you're in a community, which provide free content, you don't have to pay a single cent and everyone do this in his free time. Not anything, which is shiny, is gold, some things, are only poo.

P.S.: This doesn't include criticism, because this is important for all, sometimes, you can't see the easiest things, so please give feedback and critisim, but don't be rude and respect the desicions of the devs.

P.P.S.: Who finds typos, could take it by himself, or bring it to Germany^^