A fast, concurrent web crawler that sniffs out links on a domain containing specific strings. It follows redirects and checks every URL in the redirect chain.
- Automatic sitemap.xml discovery for fast page enumeration
- Crawls all pages on a domain
- Checks every link found on each page
- Follows redirect chains and checks each URL in the chain
- Concurrent requests for fast scanning
- Live progress display showing:
- Pages crawled
- Links checked
- Matches found
- Queue size
- Detailed output showing:
- Which page contains the matching link
- The original link URL
- The full redirect chain (if any)
- Which URL in the chain matches
- Supports both sitemap indexes and regular sitemaps
Install dependencies using uv (recommended):
uv syncOr using pip:
pip install -e .Basic usage:
python linkhound.py example.com "suspicious-string"Search for multiple strings:
python linkhound.py example.com "malware" "phishing" "spam"With options:
python linkhound.py example.com "tracker" -c 20 -t 60domain: The domain to crawl (e.g.,example.comorhttps://example.com)search_strings: One or more strings to search for in URLs (case-insensitive)
-c, --concurrent: Maximum concurrent requests (default: 10)-t, --timeout: Request timeout in seconds (default: 30)-v, --verbose: Enable verbose output for debugging (recommended to see what links are being scanned)--no-sitemap: Skip sitemap.xml discovery and crawl pages manually
Tip: Use the -v flag to see real-time progress of which pages and links are being scanned. This is especially helpful for understanding what the crawler is doing and troubleshooting any issues.
python linkhound.py myblog.com "affiliate" "ref=" "partner"python linkhound.py example.com "utm_" "fbclid" "gclid"python linkhound.py example.com "bit.ly" "tinyurl" "redirect"python linkhound.py example.com "tracker" -vThis will show real-time output of every page and link being scanned, helping you understand the crawler's progress and troubleshoot any issues.
- Sitemap Discovery: Automatically checks for sitemap.xml or sitemap_index.xml to quickly discover all pages
- Crawling: Crawls all discovered pages (from sitemap or by following links)
- Link Extraction: Extracts all links from each page (both internal and external)
- Link Checking: For each unique link:
- Makes a request without following redirects
- Checks if the URL contains any search strings
- If it's a redirect, follows to the next URL
- Repeats until reaching the final destination or 10 redirects
- Reporting: Shows all matches with their source page and redirect chain
LinkHound
Domain: https://example.com
Looking for: tracking, analytics
Pages crawled 50
Links checked 234
Matches found 3
Queue size 5
Crawling complete!
Found 3 matching links:
1. Match found:
Found on page: https://example.com/blog/post-1
Link URL: https://example.com/out/link123
Redirect chain:
https://example.com/out/link123
-> https://tracker.example.net/click?id=123
2. Match found:
Found on page: https://example.com/about
Link URL: https://analytics.service.com/track
Matching URL: https://analytics.service.com/track
- Finding and auditing affiliate links
- Detecting tracking pixels and analytics
- Identifying malicious redirects
- Checking for broken or suspicious links
- Compliance audits for link policies
- SEO analysis
MIT