Project 2 of OpenClassrooms Path: Developer Python - Book to Scrape -- extract certain information of http://books.toscrape.com into a csv file:
- product_page_url
- universal_ product_code (upc)
- title
- price_including_tax
- price_excluding_tax
- number_available
- product_description
- category
- review_rating
- image_url
These information should be extracted for each single book. Organised in the category on the website.
An improved version (2.0) is available in branch dev/version-2.0
open terminal
git clone https://github.com/DoriDoro/Book_to_Scrape.git
cd Book_to_Scrape
python -m venv venv
. venv/bin/activate
(on MacOS/Linux)venv\Scripts\activate
(on Windows)pip install -r requirements.txt
- Configuring a Python environment
- Managing data using the ETL process
- Using version control with Git and GitHub
- Applying the basics of Python programming
start the program with python3 main.py
the results are visible in following folders: