arshadkazmi42/crawl4takeover

Crawler to crawl all the external links from a website

PythonMIT

crawl4takeover

Crawler to crawl all the external links from a website

Setup

$ pip install -r requirements.txt

Usage

$ python scan.py {URL}

Working

Script scans the page and get all the URLs from the page and corresponding JS files
Stores the same domain links in memory to scan further
It filters only selected links as configured in scan.py
Script creates two output files
- output.txt: It contains all the links which are found after filter
- broken.txt: It contains all the links which are broken from the above list