Search Google Dorks like Chad. Based on ivan-sincek/nagooglesearch.
Tested on Kali Linux v2023.4 (64-bit).
Made for educational purposes. I hope it will help!
- How to Install
- How to Build and Install Manually
- Shortest Possible
- Basic Example: File Download
- Chad Extractor
- Advanced Example: Social Media Takover
- Rate Limiting
- Usage
- Images
pip3 install --upgrade google-chad
playwright install chromium
Run the following commands:
git clone https://github.com/ivan-sincek/chad && cd chad
python3 -m pip install --upgrade build
python3 -m build
python3 -m pip install dist/google_chad-5.6-py3-none-any.whl
playwright install chromium
chad -q 'intitle:"index of /" intext:"parent directory"'
Did you say Metagoofil?!
mkdir downloads
chad -q "ext:pdf OR ext:docx OR ext:xlsx OR ext:pptx" -s *.example.com -tr 200 -dir downloads
-s <site>
is optional. For more information, see Usage.
Chad's file download feature is based on Python Requests library.
Chad Extractor is a powerful tool based on Playwright's Chromium headless browser created to efficiently scrape web; in other words, to compensate for Python Requests library which cannot render JavaScript encoded HTML and is easily blocked by anti-bot solutions.
There is a built-in 4 seconds delay between starting each headless browser; otherwise, it would be very resources-intensive.
Chad Extractor was mainly designed to extract and validate data from Chad results; but, you can also use it to extract and validate data from plaintext files by specifying -pt
option - plaintext files will be treated like server responses and extraction logic will be immediately applied.
Prepare Google Dorks as social_media_dorks.txt file:
intext:"t.me/"
intext:"discord.com/invite/" OR intext:"discord.gg/invite/"
intext:"youtube.com/c/" OR intext:"youtube.com/channel/"
intext:"twitter.com/"
intext:"facebook.com/"
intext:"instagram.com/"
intext:"tiktok.com/"
intext:"linkedin.com/in/" OR intext:"linkedin.com/company/"
Prepare a template as social_media_template.json file:
{
"telegram":{
"extract":"t\\.me\\/(?:(?!(?:share)(?:$|(?:\\/|\\?)[^\\s]))[\\w\\d\\.\\_\\-\\+\\@]+)(?<!\\.)",
"extract_prepend":"https://",
"validate":"<meta property=\"og:title\" content=\"Telegram: Contact .+?\">"
},
"discord":{
"extract":"discord\\.(?:com|gg)\\/invite\\/[\\w\\d\\.\\_\\-\\+\\@]+(?<!\\.)",
"extract_prepend":"https://",
"validate":"Invite Invalid"
},
"youtube":{
"extract":"youtube\\.com\\/(?:c|channel)\\/[\\w\\d\\.\\_\\-\\+\\@]+(?<!\\.)",
"extract_prepend":"https://www.",
"validate":"This page isn't available\\."
},
"twitter":{
"extract":"(?<!pic\\.)twitter\\.com\\/(?:(?!(?:explore|hashtag|home|i|intent|personalization|search|share|tos|widgets\\.js|[\\w]+\\/(?:privacy|tos))(?:$|(?:\\/|\\?)[^\\s]))[\\w\\d\\.\\_\\-\\+\\@]+)(?<!\\.)",
"extract_prepend":"https://",
"validate":"This account doesn.?t exist"
},
"facebook":{
"extract":"facebook\\.com\\/(?:(?!(?:about|dialog|gaming|groups|sharer|share\\.php|terms\\.php)(?:$|(?:\\/|\\?)[^\\s]))[\\w\\d\\.\\_\\-\\+\\@]+)(?<!\\.)",
"extract_prepend":"https://www.",
"validate":"This page isn't available"
},
"instagram":{
"extract":"instagram\\.com\\/(?:(?!(?:about|accounts|ar|explore|p)(?:$|(?:\\/|\\?)[^\\s]))[\\w\\d\\.\\_\\-\\+\\@]+)(?<!\\.)",
"extract_prepend":"https://www.",
"extract_append":"/",
"validate":"Sorry, this page isn't available\\."
},
"tiktok":{
"extract":"(?<!vt\\.)tiktok\\.com\\/\\@[\\w\\d\\.\\_\\-\\+\\@]+(?<!\\.)",
"extract_prepend":"https://www.",
"validate":"<title.*> \\| TikTok<\\/title>"
},
"linkedin-company":{
"extract":"linkedin\\.com\\/company\\/[\\w\\d\\.\\_\\-\\+\\@\\&]+(?<!\\.)",
"extract_prepend":"https://hr.",
"validate":"Page not found"
},
"linkedin-user":{
"extract":"linkedin\\.com\\/in\\/[\\w\\d\\.\\_\\-\\+\\@\\&]+(?<!\\.)",
"extract_prepend":"https://hr.",
"validate":"An exact match for .+ could not be found\\."
}
}
Make sure your regular expressions return only one capturing group e.g. [1, 2, 3, 4]
; and not a touple e.g. [(1, 2), (3, 4)]
.
Make sure to properly escape regular expression specific symbols in your template file, e.g. make sure to escape dot .
as \\.
, and forward slash /
as \\/
, etc.
All regular expression searches are case-insensitive.
Web content fetched from the URLs in Chad results will be matched against all the regular expressions (extract
attributes) in the template file in order to find as much relevant data as possible.
To extract data without validating it, omit validate
attributes from the template file as necessary.
chad -q social_media_dorks.txt -s *.example.com -tr 200 -o results.json
chad-extractor -t social_media_template.json -res results.json -o results_report.json
Manually check if social media URLs in summary --> validated
are available for takeover:
{
"started_at":"2023-12-23 03:30:10",
"summary":{
"validated":[
"https://t.me/does_not_exist"
],
"extracted":[
"https://discord.com/invite/exists",
"https://t.me/does_not_exist",
"https://t.me/exists"
]
},
"failed":{
"validation":[],
"extraction":[]
},
"full":[
{
"url":"https://example.com/about",
"results":{
"telegram":[
"https://t.me/does_not_exist",
"https://t.me/exists"
],
"discord":[
"https://discord.com/invite/exists"
]
}
}
]
}
Prepare sites/domains/subdomains as sites.txt
file:
*.example.com
*.example.com -www
[Optional] Prepare bot-safe user agents as user_agents.txt
file, where <your-api-key>
is your API key from scrapeops.io:
python3 -c 'import json, requests; open("user_agents.txt", "w").write(("\n").join(requests.get("http://headers.scrapeops.io/v1/user-agents?api_key=<your-api-key>&num_results=100", verify = False).json()["result"]))'
Automate:
mkdir chad_results
IFS=$'\n'; count=0; for site in $(cat sites.txt); do count=$((count+1)); echo "#${count} | ${site}"; chad -q social_media_dorks.txt -s "${site}" -tr 200 -a user_agents.txt -o "chad_results/results_${count}.json"; done
chad-extractor -t social_media_template.json -res chad_results -a user_agents.txt -o results_report.json -v
Google's cooling-off period can be from a few hours to a whole day.
To avoid hitting Google's rate limit with Chad, increase the minimum and maximum sleep between Google queries and/or pages; or use proxies (1)(2), although, free proxies are not always stable and often blocked.
To download a list of free proxies, run:
curl -s 'https://proxylist.geonode.com/api/proxy-list?limit=50&page=1&sort_by=lastChecked&sort_type=desc' -H 'Referer: https://proxylist.geonode.com/' | jq -r '.data[] | "\(.protocols[])://\(.ip):\(.port)"' > proxies.txt
Additionally, to avoid hitting e.g. Instagram's rate limit with Chad Extractor, you might want to isolate it in a separate run, increase the wait time, and use only one thread.
Chad v5.6 ( github.com/ivan-sincek/chad )
Usage: chad -q queries [-s site ] [-x proxies ] [-o out ]
Example: chad -q queries.txt [-s *.example.com] [-x proxies.txt] [-o results.json]
DESCRIPTION
Search Google Dorks like Chad
QUERIES
File with Google Dorks or a single query to use
-q, --queries = queries.txt | intext:password | "ext:tar OR ext:zip" | etc.
SITE
Domain[s] to search
-s, --site = example.com | sub.example.com | *.example.com | "*.example.com -www" | etc.
TIME
Get results not older than the specified time in months
-t, --time = 6 | 12 | 24 | etc.
TOTAL RESULTS
Total number of unique results
Default: 100
-tr, --total-results = 200 | etc.
PAGE RESULTS
Number of results per page - capped at 100 by Google
Default: randint(75, 100)
-pr, --page-results = 50 | etc.
MINIMUM QUERIES
Minimum sleep between Google queries
Default: 75
-min-q, --minimum-queries = 120 | etc.
MAXIMUM QUERIES
Maximum sleep between Google queries
Default: minimum + 50
-max-q, --maximum-queries = 180 | etc.
MINIMUM PAGES
Minimum sleep between Google pages
Default: 15
-min-p, --minimum-pages = 30 | etc.
MAXIMUM PAGES
Maximum sleep between Google pages
Default: minimum + 10
-max-p, --maximum-pages = 60 | etc.
USER AGENTS
File with user agents to use
Default: random
-a, --user-agents = user_agents.txt | etc.
PROXIES
File with proxies or a single proxy to use
-x, --proxies = proxies.txt | http://127.0.0.1:8080 | etc.
DIRECTORY
Downloads directory
All downloaded files will be saved in this directory
-dir, --directory = downloads | etc.
THREADS
Number of parallel files to download
Default: 5
-th, --threads = 20 | etc.
OUT
Output file
-o, --out = results.json | etc.
NO SLEEP ON START
Safety feature to prevent accidental rate limit triggering
-nsos, --no-sleep-on-start
DEBUG
Debug output
-dbg, --debug
Chad Extractor v5.6 ( github.com/ivan-sincek/chad )
Usage: chad-extractor -t template -res results -o out [-th threads] [-r retries] [-w wait]
Example: chad-extractor -t template.json -res chad_results -o results_report.json [-th 10 ] [-r 5 ] [-w 10 ]
DESCRIPTION
Extract and validate data from Chad results or plaintext files
TEMPLATE
JSON template file with extraction and validation information
-t, --template = template.json | etc.
RESULTS
Directory containing Chad results or plaintext files, or a single file
In case of a directory, files ending with '.report.json' will be ignored
-res, --results = chad_results | results.json | urls.txt | etc.
PLAINTEXT
Treat all the results as plaintext files
-pt, --plaintext
EXCLUDES
File with regular expressions or a single regular expression to exclude the page content
Applies only on extraction
-e, --excludes = regexes.txt | "<div id=\"seo\">.+?<\/div>" | etc.
THREADS
Number of parallel headless browsers to run
Default: 4
-th, --threads = 10 | etc.
RETRIES
Number of retries per URL
Default: 2
-r, --retries = 5 | etc.
WAIT
Wait time before returning the page content
Default: 4
-w, --wait = 10 | etc.
USER AGENTS
File with user agents to use
Default: random
-a, --user-agents = user_agents.txt | etc.
PROXY
Web proxy to use
-x, --proxy = http://127.0.0.1:8080 | etc.
OUT
Output file
-o, --out = results_report.json | etc.
VERBOSE
Create additional supporting output files
-v, --verbose
DEBUG
Debug output
-dbg, --debug
Figure 1 - Single Query
Figure 2 - Multiple Queries