/bookscrap

Student project #1 - Web scraping, use Python basics to create a program that automate the process of extracting, transform and load data from the online library "Books to Scrape".

Primary LanguagePython

Logo

Web scraping

This student project is the #1 of my training.
You can follow the next one here.

Table of Contents
  1. About The Project
  2. Installation
  3. Contact

About The Project

🌱 Developed skills

  • Configure a Python environnement.
  • Apply the basics of Python programming.
  • Use Git and GitHub version control systeme.
  • Manage data with the ETL (Extract-Transform-Load) process.
  • Use of BeautifulSoup, request and csv libraries.

📖 Scenario

I am a marketing analyst at Books Online, a large online bookstore specializing in used books.
As part of my job, I try to manually 😔 track used book prices on competitors' websites, but it's too much work.
My team and I decided to automate this laborious task with a program (a scraper) 💡 developed in Python, which is able to extract pricing information from other online bookstores.

🚧 Project goal

Sam, my team leader, asked me to develop a beta version of this system to track book prices at Books to Scrape, an online book retailer.
In this beta version, the program will simply be an on-demand executable application aimed at retrieving prices at the time of its execution.

🚀 Deliverable

Books to Scrape library is composed of categories and categories are composed of books.
For each categories, a csv file is created at data/csv/category_name.csv with the following informations of each books:

  • product_page_url
  • universal_ product_code
  • title
  • price_including_tax
  • price_excluding_tax
  • number_available
  • product_description
  • category
  • review_rating
  • image_url
For each books, the related image is save at data/images/category_name/book_name.jpg

product

(back to top)

Installation

  1. Install Python ;

  2. Clone the project in desired directory ;

    git clone https://github.com/KDerec/bookscrap.git
  3. Change directory to folder ;

    cd path/to/bookscrap
  4. Create a virtual environnement (More detail to Creating a virtual environment) ;

    • For Windows :
      python -m venv env
    • For Linux :
      python3 -m venv env
  5. Activate the virtual environment ;

    • For Windows :
      .\env\Scripts\activate
    • For Linux :
      source env/bin/activate
  6. Install package of requirements.txt ;

    pip install -r requirements.txt
  7. Run main.py and enjoy !

Python installation

  1. Install Python. If you are using Linux or macOS, it should be available on your system already. If you are a Windows user, you can get an installer from the Python homepage and follow the instructions to install it:

    • Go to python.org
    • Under the Download section, click the link for Python "3.xxx".
    • At the bottom of the page, click the Windows Installer link to download the installer file.
    • When it has downloaded, run it.
    • On the first installer page, make sure you check the "Add Python 3.xxx to PATH" checkbox.
    • Click Install, then click Close when the installation has finished.
  2. Open your command prompt (Windows) / terminal (macOS/ Linux). To check if Python is installed, enter the following command (this should return a version number.):

    python -V
    # If the above fails, try:
    python3 -V
    # Or, if the "py" command is available, try:
    py -V

(back to top)

Contact

Kévin Dérécusson - kevin.derecusson@outlook.fr

Project Link: https://github.com/KDerec/bookscrap

(back to top)

This student project is the #1 of my training and you can follow the next one here.