Scrapper is a web scrapper that can scrape and extract information from any website. It's designed to be fast and efficient, making it easy for you to gather the data you need quickly and easily.Specifically it is designed for
- E.Leclerc and more specifically for scraping details of all the products of Sports section and Jwellery section.
The data is stored locally on the mongodb database and some Queries are created to analyze the data.
- Ability to extract data from multiple websites at once, pagination is handled vey well.
- Easy-to-use interface with well organised code structure and proper object oriented approach.
- Fast and efficient performance, even when handling large amounts of data.
- All the features of scrapy comes very handy during the development and the code writing.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
You will need the following software installed on your computer to run Scrapper:
- Python (version 3.x or later)
- scrapy
- Pymongo
- MongoDB Compass
- Git
- Having the Anaconda installed will be a plus, as it is easy to manage virtual environments in Anaconda, but it's not necassory
- Clone or download the repository to your local machine.
git clone https://github.com/Harsh324/Scrapper.git
- Navigate to the directory where you have cloned or extracted the repository.
cd Scrapper
- Install the required dependencies and make sure that mongoDB is working and connection is on.
-
Now all the prerequistes are fixed, we need to run the scrapy spider to crawl and scrap the data, it takes around 45 - 50 minutes to scrap all the data of sports section and jwellery section.
-
Now run the following commands and wait for approx 40 - 50 minutes to finish the process of scraping all the data
cd scrap_Eleclerc scrapy crawl scrapit
-
Now, In the same directory there is the file
runQuery.py
one can write the query there and run the file py wrting the commandpython runQuery.py
-
The result of Query will be output on the terminal
- Harsh - Initial work