
To see a breakdown of domains in the majestic million by TLD, run:

tail -n +2 majestic_million.csv | cut -d, -f4 | sort | uniq -c | sort -r -n | less

Zone file access

Governed by ICANN, all gtlds must provide some form of access:

There are many anecdotes that registries will try to force extra terms, in which case a complaint should be filed:

Data collection from MM

cat majestic_million.csv | python3 > majestic_all_possible_domains
cat majestic_all_possible_domains | pv -l -s $(wc -l majestic_all_possible_domains) | parallel --will-cite -j 20 -- dig {} @ NS | gzip > dig.output.gz

Ranking, statistics, and graphs

gzcat dig.output.gz | pv -l | python3 > raw_results.csv
tail -n+2 raw_results.csv | sort -t, -k1 | pv -l | cat > sorted_results.csv
tail -n+2 majestic_million.csv | sort -t, -k3 | pv -l | cat > sorted_majestic.csv
join -t, -o 2.1,0,1.2,1.3,1.4,1.5 -1 1 -2 3 sorted_results.csv sorted_majestic.csv | sort -t, -k1 -n | pv -l | cat <(echo $'Majestic Million Rank,Domain,Num NS records,Num glue records,Num out-of-bailiwick glue,Num loose-out-bailiwick glue') - > collated_results.csv
cat collated_results.csv | python3

Sanity checks

# check the number of NS records
zgrep -E -e '^[^;]+(\t| )NS(\t| )' dig.output.gz | wc -l
# check the most common nameservers
zgrep -E -e '^[^;]+(\t| )NS(\t| )' dig.output.gz | tr ' ' $'\t' | tr -s $'\t' | cut -f 5 | sort | uniq -c | sort -r -n > ns.popular
# check for NS records that are actually IP addresses
zgrep -E -e '^[^;]+(\t| )NS(\t| )' dig.output.gz | tr ' ' $'\t' | tr -s $'\t' | cut -f 5 | grep -E -e '^[0-9.]*$' | sort | wc -l

Full pipeline (for video)

join -t, -o 2.1,0,1.2,1.3,1.4,1.5 -1 1 -2 3 <(cat majestic_million.csv | head -n 1000 | python3 | parallel --will-cite -j 20 -- dig {} @ NS | python3 | tail -n+2 | sort -t, -k1) <(tail -n+2 majestic_million.csv | sort -t, -k3) | sort -t, -k1 -n | cat <(echo $'Majestic Million Rank,Domain,Num NS records,Num glue records,Num out-of-bailiwick glue,Num loose-out-bailiwick glue') - | python3