duckduckgo/tracker-radar

New update for https://github.com/duckduckgo/tracker-radar/tree/main/entities

pooneh-nb opened this issue · 4 comments

I was working on a project to identify tracking/adverting domains on the Alexa echo device. I used https://github.com/duckduckgo/tracker-radar/tree/main/entities to find the parent companies behind each domain name. Thanks for sharing such a great dataset!
I figured out several domain names were not available in your dataset. So, I manually look them up from ICANN, crunchbase.com, or their website. Since some are tracking/advertising websites, I think it's good to update your database. Here is the update:

{'acsechocaptiveportal.com' : 'Amazon Technologies, Inc.',
'amazon-dss.com' : 'Amazon Technologies, Inc.',
'amazonalexa.com': 'Amazon Technologies, Inc.',
'amcs-tachyon.com' : 'Amazon Technologies, Inc.',
'fireoscaptiveportal.com' : 'Amazon Technologies, Inc.',
'chtbl.com' : 'Chartable Holding Inc',
'chrt.fm' : 'Chartable Holding Inc',
'dillilabs.com' : 'Dilli Labs LLC',
'megaphone.fm' : 'Spotify AB',
'omny.fm' : 'Triton Digital, Inc.',
'podtrac.com' : 'Podtrac Inc',
'voiceapps.com' : 'Voice Apps LLC',
'mittendorf.net' : 'individual',
'doctorpooch.com' : 'Dilli Labs LLC',
'kwimer.com' : 'Highwinds Network Group, Inc'}

I'm gonna cite this dataset in our paper. Can I ask where is the source of this dataset?

Hey Pouneh, thanks a lot for sharing you findings, we really appreciate it!

I'm gonna cite this dataset in our paper. Can I ask where is the source of this dataset?

Not sure if I understand your question, but this repo is the source. You can reference it like this:

"DuckDuckGo Tracker Radar", [online] Available: https://github.com/duckduckgo/tracker-radar, Retrieved: March 2022.

Hey Konrad, thanks for your reply.
So my question was that what is the source of this dataset? Like did you query crunchbase.com or WHOIS to find the company behind each domain name?

Ah, sorry for misunderstanding. We use public WHOIS data, SSL cert data and do manual investigation (e.g. by reviewing privacy polices). We also do semi-automatic cleanup. Small portion of the data is contributed by outside contributors. LMK if that helps!

I see that makes sense. Thank you!