Crawler to crawl all the external links from a website
$ pip install -r requirements.txt
$ python scan.py {URL}
- Script scans the page and get all the URLs from the page and corresponding JS files
- Stores the same domain links in memory to scan further
- It filters only selected links as configured in
scan.py
- Script creates two output files
output.txt
: It contains all the links which are found after filterbroken.txt
: It contains all the links which are broken from the above list