/Web-Crawling

Scripts for Crawling Websites.

Primary LanguagePython

Web-Crawling

Python Scripts for Crawling Websites.

Project 1: URL Crawler

Code is writen in Python and its an implementation of Breadth First Algorithm.

The Script is capable to fetch all the url's associated with the Website.

Project 2: Sitemap XML Parser

This crawler will take robots.txt as input and fetch/parse the sitemapxml. It won't go to the individual pages of the sitemap, it will just count and categories them.

Purpose is to get the estimate url count of the website.

Project 3: Sitemap Parser

This crawler will take either robots.txt or sitemap.xml as input and parse the fetched pages using xpath. Purpose is to gather data from a website( if you have permission).