/Web-Scraping-and-optimization-using-parallel-approach

Using Python to scrape content, and also parallelise it using preprocessing

Primary LanguagePython

Scraping and Optimization

Web scraping is the process of obtaining data from a set of web pages and saving them locally on your computer. It will allow you to automatically load and extract data from multiple pages of websites as per your requirement. Scrapers are usually custom built for a specific website, or can be configured to work with any website.

This project is about web scraping using Python.

We are accessing the content from the website toscrape.com.

To Scrape is an online sandbox for scraping data from the internet. The purpose of this website is to allow scraping of data which will help beginners to learn scraping and also developers to validate their scraping technologies.

We are accessing the content from http://quotes.toscrape.com/ which is a website that contains quotes, along with their authors, and tags associated with the quotes.

The script serial.py and parallel.py are used to scrape data from the mentioned websited serially and parallely, and we compute the time taken for each of the scrapers to finish the process. We then store the data scraped into a CSV file.