Full Ader1 is a project that contains the ads.txt
file of the 1M most
visited websites of the Alexa ranking2.
https://en.wikipedia.org/wiki/Ads.txt
ads.txt
files are an initiative from the IAB Technology Laboratory (you can
read the description in their website) for websites that aims to
provide ad-selling companies a list of companies that are selling ads in the
website where the file is served.
The goal of ads.txt
files is to fight fraud but they can be used for many
other things. This database aims to encourage you to find new uses for the
collection of ads.txt
files.
-
What if we use the domains in the files as a list of blocked domains for AdBlockers?
-
What if we make a network graph and check the hierarchies in the ad business?
-
What if we compare the contents of the websites listed here with the ad companies they have?
-
What if...?
The database is stored in a really inefficient way.
Each file is named after its domain + .txt
extension. Example:
domain.com.txt
Empty files mean there's no valid ads.txt
file in the website or it's empty.
They are kept because if we don't keep them there's no way to know if the page
was in the top 1M websites or not.
Code for updating is not made yet. First approach was a bulk downloader that made a heavy use of multithreading and will be discarded for a better downloader that is able to commit to this repository by its own.
Footnotes
-
Yeah... The name is a horrible joke ↩
-
The ranking is obtained from this weird link I found on the Internet: https://s3.amazonaws.com/alexa-static/top-1m.csv.zip ↩