Python Scripts for Crawling Websites.
Code is writen in Python and its an implementation of Breadth First Algorithm.
The Script is capable to fetch all the url's associated with the Website.
This crawler will take robots.txt as input and fetch/parse the sitemapxml. It won't go to the individual pages of the sitemap, it will just count and categories them.
Purpose is to get the estimate url count of the website.
This crawler will take either robots.txt or sitemap.xml as input and parse the fetched pages using xpath. Purpose is to gather data from a website( if you have permission).