This collection is a curated list of websites that employ the robots.txt
file to restrict access to AI Agents, AI crawlers and GPTs.
It will be updated monthly.
The robots.txt
file allows website owners to control and limit the access of these user agents to certain areas of their website by specifying rules and directives.
# OpenAIโs web crawler: GPT3.5, GPT4, ChatGPT
# https://platform.openai.com/docs/bots
User-agent: GPTBot
# ChatGPT plugins
# https://platform.openai.com/docs/bots
User-agent: ChatGPT-User
# OpenAI Search bot
# https://platform.openai.com/docs/bots
User-agent: OAI-SearchBot
# Google's web crawler: Bard, VertexAI, Gemini
# https://blog.google/technology/ai/an-update-on-web-publisher-controls/
User-agent: Google-Extended
# Apple's web crawler, dedicated to GenAI projects
# https://support.apple.com/en-us/119829
User-agent: Applebot-Extended
# Claude
User-agent: anthropic-ai
# Claude Bot
User-agent: ClaudeBot
# Claude web
User-agent: Claude-Web
# Cohere
User-agent: Cohere-ai
# Perplexity
User-agent: PerplexityBot
# Common Crawl
# https://commoncrawl.org/ccbot
User-agent: CCBot
# Omglibot: webz.io
# https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/
User-agent: Omgilibot
User-agent: Omgili
User-agent: Webzio-Extended
# Facebook: Llama
# https://developers.facebook.com/docs/sharing/bot/
User-agent: FacebookBot
# ByteDance: Duobao
User-agent: Bytespider
# Censorship area
Disallow: /
Please note that this blocklist is intended for informational purposes only. Despite the provoking project name, it's fine to disallow web crawling and protect content ownership.
- Scanned: 66
- โ Passing: 41 %
- ๐ Blocked: 59 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
The Times | ๐ฌ๐ง | ๐ |
BBC | ๐ฌ๐ง | ๐ |
The Guardian | ๐ฌ๐ง | ๐ |
The Economist | ๐ฌ๐ง | ๐ |
Financial Times | ๐ฌ๐ง | ๐ |
The Independent | ๐ฌ๐ง | โ |
The Telegraph | ๐ฌ๐ง | ๐ |
Daily Mail | ๐ฌ๐ง | ๐ |
The Sun | ๐ฌ๐ง | ๐ |
Daily Mirror | ๐ฌ๐ง | โ |
Daily Express | ๐ฌ๐ง | โ |
Washington Post | ๐บ๐ธ | ๐ |
USA Today | ๐บ๐ธ | โ |
Fox News | ๐บ๐ธ | โ |
ABC News | ๐บ๐ธ | ๐ |
NBC News | ๐บ๐ธ | ๐ |
CBS News | ๐บ๐ธ | ๐ |
Los Angeles Times | ๐บ๐ธ | ๐ |
Chicago Tribune | ๐บ๐ธ | โ |
New York Post | ๐บ๐ธ | ๐ |
New York Daily News | ๐บ๐ธ | โ |
The New Yorker | ๐บ๐ธ | ๐ |
Vice | ๐บ๐ธ | โ |
New York Times | ๐บ๐ธ | ๐ |
Wall Street Journal | ๐บ๐ธ | ๐ |
CNN | ๐บ๐ธ | ๐ |
El Paรญs | ๐ช๐ธ | โ |
Sรผddeutsche Zeitung | ๐ฉ๐ช | ๐ |
Der Spiegel | ๐ฉ๐ช | ๐ |
Corriere della Sera | ๐ฎ๐น | ๐ |
La Repubblica | ๐ฎ๐น | ๐ |
Le Monde | ๐ซ๐ท | ๐ |
Libรฉration | ๐ซ๐ท | ๐ |
Le Figaro | ๐ซ๐ท | ๐ |
20 Minutes | ๐ซ๐ท | ๐ |
Ouest France | ๐ซ๐ท | ๐ |
Le Parisien | ๐ซ๐ท | ๐ |
L'Equipe | ๐ซ๐ท | ๐ |
Le Point | ๐ซ๐ท | ๐ |
Marianne | ๐ซ๐ท | ๐ |
Le Nouvel Observateur | ๐ซ๐ท | ๐ |
L'Express | ๐ซ๐ท | ๐ |
France 24 | ๐ซ๐ท | ๐ |
BFMTV | ๐ซ๐ท | ๐ |
CNews | ๐ซ๐ท | โ |
Le Monde Diplomatique | ๐ซ๐ท | โ |
Mediapart | ๐ซ๐ท | ๐ |
Courrier International | ๐ซ๐ท | ๐ |
Brut | ๐ซ๐ท | โ |
IMDB | ๐ | โ |
Allocine | ๐ซ๐ท | โ |
Fakt | ๐ต๐ฑ | โ |
Super Express | ๐ต๐ฑ | โ |
Gazeta Wyborcza | ๐ต๐ฑ | ๐ |
Rzeczpospolita | ๐ต๐ฑ | โ |
Dziennik Gazeta Prawna | ๐ต๐ฑ | โ |
Polityka | ๐ต๐ฑ | โ |
Newsweek Polska | ๐ต๐ฑ | โ |
Goลฤ Niedzielny | ๐ต๐ฑ | โ |
Sieci | ๐ต๐ฑ | โ |
Do Rzeczy | ๐ต๐ฑ | โ |
Twรณj Styl | ๐ต๐ฑ | โ |
Zwierciadลo | ๐ต๐ฑ | โ |
Wysokie Obcasy Extra | ๐ต๐ฑ | ๐ |
Pani | ๐ต๐ฑ | โ |
Elle | ๐ต๐ฑ | โ |
- Scanned: 9
- โ Passing: 56 %
- ๐ Blocked: 44 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
Prime Video | ๐ | โ |
Netflix | ๐ | โ |
Disney+ | ๐ | ๐ |
Hulu | ๐บ๐ธ | ๐ |
HBO Max | ๐บ๐ธ | โ |
Canal+ | ๐ซ๐ท | ๐ |
FranceTV | ๐ซ๐ท | โ |
TF1 | ๐ซ๐ท | ๐ |
6Play | ๐ซ๐ท | โ |
- Scanned: 6
- โ Passing: 67 %
- ๐ Blocked: 33 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
Soundcloud | ๐ | ๐ |
Youtube | ๐ | โ |
Apple Music | ๐ | โ |
Spotify | ๐ | ๐ |
Deezer | ๐ซ๐ท | โ |
LastFM | ๐ฌ๐ง | โ |
- Scanned: 8
- โ Passing: 75 %
- ๐ Blocked: 25 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
Google Podcasts | ๐ | โ |
Apple Podcast | ๐ | โ |
Spotify Podcaster | ๐ | ๐ |
Buzzsprout | ๐ | โ |
Podbean | ๐ | โ |
Acast | ๐ฌ๐ง | โ |
AudioMeans | ๐ซ๐ท | โ |
Radio France | ๐ซ๐ท | ๐ |
- Scanned: 6
- โ Passing: 67 %
- ๐ Blocked: 33 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
PornHub | ๐ | ๐ |
YouPorn | ๐ | ๐ |
Xnxx | ๐ | โ |
Xvideos | ๐ | โ |
Xhamster | ๐ | โ |
OnlyFan | ๐ | โ |
- Scanned: 5
- โ Passing: 100 %
- ๐ Blocked: 0 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
Bible | ๐บ๐ธ | โ |
Bible gateway | ๐บ๐ธ | โ |
Jehovah's Witnesses | ๐บ๐ธ | โ |
Vatican | ๐ป๐ฆ | โ |
Islamweb | ๐ | โ |
- Scanned: 13
- โ Passing: 31 %
- ๐ Blocked: 62 %
- โ Unknown: 8 %
Name | Country | Status |
---|---|---|
๐ | ๐ | |
๐ | ๐ | |
๐ | โ | |
Hacker News | ๐ | โ |
Lobsters | ๐ | ๐ |
๐ | ๐ | |
TikTok | ๐ | โ |
๐ | ๐ | |
๐ | โ | |
Quora | ๐ | ๐ |
VK | ๐ท๐บ | โ |
TripAdvisor | ๐ | ๐ |
Yelp | ๐ | ๐ |
- Scanned: 42
- โ Passing: 71 %
- ๐ Blocked: 19 %
- โ Unknown: 10 %
Name | Country | Status |
---|---|---|
Michael Jackson | ๐บ๐ธ | โ |
Madonna | ๐บ๐ธ | โ |
Taylor Swift | ๐บ๐ธ | ๐ |
Rihanna | ๐บ๐ธ | โ |
Bruno Mars | ๐บ๐ธ | โ |
Justin Bieber | ๐บ๐ธ | ๐ |
Beyoncรฉ | ๐บ๐ธ | โ |
Katy Perry | ๐บ๐ธ | ๐ |
Lady Gaga | ๐บ๐ธ | ๐ |
Hardwell | ๐บ๐ธ | โ |
Dimitri Vegas & Like Mike | ๐บ๐ธ | โ |
Kanye West | ๐บ๐ธ | โ |
Black Eyed Peas | ๐บ๐ธ | โ |
Imagine Dragons | ๐บ๐ธ | โ |
Twenty One Pilots | ๐บ๐ธ | โ |
Maroon 5 | ๐บ๐ธ | ๐ |
Selena Gomez | ๐บ๐ธ | ๐ |
Usher | ๐บ๐ธ | ๐ |
Stromae | ๐ง๐ช | โ |
Aya Nakamura | ๐ซ๐ท | โ |
Soprano | ๐ซ๐ท | โ |
Johnny Hallyday | ๐ซ๐ท | โ |
Grand Corps Malade | ๐ซ๐ท | โ |
Zaho | ๐ซ๐ท | โ |
Jean Louis Aubert | ๐ซ๐ท | โ |
Camelia Jordana | ๐ซ๐ท | โ |
Indochine | ๐ซ๐ท | โ |
Tryo | ๐ซ๐ท | โ |
David Guetta | ๐ซ๐ท | โ |
Mc Solaar | ๐ซ๐ท | โ |
Zaz | ๐ซ๐ท | โ |
Christine and the Queens | ๐ซ๐ท | โ |
Boulevard des Airs | ๐ซ๐ท | โ |
Calogero | ๐ซ๐ท | โ |
Hoshi | ๐ซ๐ท | โ |
Avicii | ๐ธ๐ช | โ |
Adele | ๐ฌ๐ง | โ |
Calvin Harris | ๐ฌ๐ง | โ |
Ed Sheeran | ๐ฌ๐ง | โ |
Arctic Monkeys | ๐ฌ๐ง | โ |
Coldplay | ๐ฌ๐ง | โ |
The Weeknd | ๐จ๐ฆ | ๐ |
- Scanned: 3
- โ Passing: 100 %
- ๐ Blocked: 0 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
White House | ๐บ๐ธ | โ |
Elysรฉe | ๐ซ๐ท | โ |
Europe | ๐ช๐บ | โ |
- Scanned: 28
- โ Passing: 89 %
- ๐ Blocked: 11 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
Google Scholar | ๐ | โ |
Sci-Hub | ๐ | โ |
PubPeer | ๐ | โ |
Scopus | ๐ณ๐ฑ | ๐ |
Elsevier | ๐ณ๐ฑ | ๐ |
ScienceDirect | ๐ณ๐ฑ | โ |
MDPI | ๐จ๐ญ | โ |
Springer | ๐ฉ๐ช | โ |
Wiley | ๐บ๐ธ | โ |
American Chemical Society | ๐บ๐ธ | โ |
PubMed | ๐บ๐ธ | โ |
Academia | ๐บ๐ธ | โ |
Science | ๐บ๐ธ | ๐ |
ArXiv | ๐บ๐ธ | โ |
American Physical Society | ๐บ๐ธ | โ |
Mendeley | ๐ฌ๐ง | โ |
Nature | ๐ฌ๐ง | โ |
Taylor & Francis | ๐ฌ๐ง | โ |
Oxford University Press | ๐ฌ๐ง | โ |
Cambridge University Press | ๐ฌ๐ง | โ |
Royal Society of Chemistry | ๐ฌ๐ง | โ |
ResearchGate | ๐ฉ๐ช | โ |
BNF | ๐ซ๐ท | โ |
Cairn | ๐ซ๐ท | โ |
Persee | ๐ซ๐ท | โ |
Gallica | ๐ซ๐ท | โ |
HAL | ๐ซ๐ท | โ |
OpenEdition | ๐ซ๐ท | โ |
- Scanned: 3
- โ Passing: 67 %
- ๐ Blocked: 33 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
Github | ๐ | โ |
Gitlab | ๐ | โ |
Stack Overflow | ๐ | ๐ |
- Scanned: 19
- โ Passing: 74 %
- ๐ Blocked: 26 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
Wikipedia | ๐ | โ |
Medium | ๐ | ๐ |
Substack | ๐ | โ |
Common Crawl | ๐ | โ |
Internet Archive | ๐ | โ |
Wayback Machine | ๐ | โ |
Notion | ๐ | โ |
Weather | ๐บ๐ธ | ๐ |
AccuWeather | ๐บ๐ธ | โ |
Mรฉtรฉo France | ๐ซ๐ท | โ |
Getty Images | ๐บ๐ธ | โ |
Shutterstock | ๐บ๐ธ | ๐ |
Adobe Stock | ๐บ๐ธ | ๐ |
Unsplash | ๐จ๐ฆ | ๐ |
Pexels | ๐ฉ๐ช | โ |
Pixabay | ๐ฉ๐ช | โ |
Flickr | ๐บ๐ธ | โ |
500px | ๐จ๐ฆ | โ |
Giphy | ๐บ๐ธ | โ |
- Scanned: 1
- โ Passing: 100 %
- ๐ Blocked: 0 %
- โ Unknown: 0 %
Name | Country | Status |
---|---|---|
Indeed | ๐บ๐ธ | โ |
A.k.a: do they understand their business model? ๐ธ
Name | Status |
---|---|
Getty Images | โ |
Pexels | โ |
500px | โ |
A.k.a: this is public interest. ๐
Name | Status |
---|---|
Medium | ๐ |
Quora | ๐ |
Elsevier | ๐ |
Scopus | ๐ |
Science | ๐ |
Looking for contributions:
- Enrich website database
- Chinese websites
- New categories
Please open issues!
- Ping me on Twitter @samuelberthe (DMs, mentions, whatever :))
- Fork the project
- Fix open issues or request new features
Don't hesitate ;)
python -m venv venv
source ./venv/bin/activate
pip3 install -r requirements.txt
python3 scrape.py
# then copy the last version into readme
Give a โญ๏ธ if this project helped you!
Copyright ยฉ 2024 Samuel Berthe.
This project is MIT licensed.