/Scrapper

Scrapper is a fast and efficient web scraper designed for extracting information from multiple websites simultaneously, with a focus on E.Leclerc's Sports and Jewelry sections. It features a user-friendly interface, well-organized code structure, and utilizes the power of Scrapy for seamless development.

Primary LanguagePython

Scrapper

Scrapper is a web scrapper that can scrape and extract information from any website. It's designed to be fast and efficient, making it easy for you to gather the data you need quickly and easily.Specifically it is designed for

The data is stored locally on the mongodb database and some Queries are created to analyze the data.

Features

  • Ability to extract data from multiple websites at once, pagination is handled vey well.
  • Easy-to-use interface with well organised code structure and proper object oriented approach.
  • Fast and efficient performance, even when handling large amounts of data.
  • All the features of scrapy comes very handy during the development and the code writing.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

You will need the following software installed on your computer to run Scrapper:

Installing

  1. Clone or download the repository to your local machine.
    git clone https://github.com/Harsh324/Scrapper.git
    
  2. Navigate to the directory where you have cloned or extracted the repository.
    cd Scrapper
    
  3. Install the required dependencies and make sure that mongoDB is working and connection is on.

Working

  1. Now all the prerequistes are fixed, we need to run the scrapy spider to crawl and scrap the data, it takes around 45 - 50 minutes to scrap all the data of sports section and jwellery section.

  2. Now run the following commands and wait for approx 40 - 50 minutes to finish the process of scraping all the data

    cd scrap_Eleclerc
    scrapy crawl scrapit
    
  3. Now, In the same directory there is the file runQuery.py one can write the query there and run the file py wrting the command

    python runQuery.py
    
  4. The result of Query will be output on the terminal

Built With

Authors