bookscrap: A Python repository from KDerec

Web scraping

This student project is the #1 of my training.
You can follow the next one here.

Table of Contents

About The Project
Installation
- Python installation
Contact

About The Project

🌱 Developed skills

Configure a Python environnement.
Apply the basics of Python programming.
Use Git and GitHub version control systeme.
Manage data with the ETL (Extract-Transform-Load) process.
Use of BeautifulSoup, request and csv libraries.

📖 Scenario

I am a marketing analyst at Books Online, a large online bookstore specializing in used books.
As part of my job, I try to manually 😔 track used book prices on competitors' websites, but it's too much work.
My team and I decided to automate this laborious task with a program (a scraper) 💡 developed in Python, which is able to extract pricing information from other online bookstores.

🚧 Project goal

Sam, my team leader, asked me to develop a beta version of this system to track book prices at Books to Scrape, an online book retailer.
In this beta version, the program will simply be an on-demand executable application aimed at retrieving prices at the time of its execution.

🚀 Deliverable

Books to Scrape library is composed of categories and categories are composed of books.
For each categories, a csv file is created at data/csv/category_name.csv with the following informations of each books:

product_page_url
universal_ product_code
title
price_including_tax
price_excluding_tax
number_available
product_description
category
review_rating
image_url

For each books, the related image is save at data/images/category_name/book_name.jpg

(back to top)

Installation

Install Python ;

Clone the project in desired directory ;

git clone https://github.com/KDerec/bookscrap.git

Change directory to folder ;
```
cd path/to/bookscrap
```
Create a virtual environnement (More detail to Creating a virtual environment) ;
- For Windows :
```
python -m venv env
```
- For Linux :
```
python3 -m venv env
```
Activate the virtual environment ;
- For Windows :
```
.\env\Scripts\activate
```
- For Linux :
```
source env/bin/activate
```
Install package of requirements.txt ;
```
pip install -r requirements.txt
```
Run main.py and enjoy !

Python installation

Install Python. If you are using Linux or macOS, it should be available on your system already. If you are a Windows user, you can get an installer from the Python homepage and follow the instructions to install it:
- Go to python.org
- Under the Download section, click the link for Python "3.xxx".
- At the bottom of the page, click the Windows Installer link to download the installer file.
- When it has downloaded, run it.
- On the first installer page, make sure you check the "Add Python 3.xxx to PATH" checkbox.
- Click Install, then click Close when the installation has finished.
Open your command prompt (Windows) / terminal (macOS/ Linux). To check if Python is installed, enter the following command (this should return a version number.):
```
python -V
# If the above fails, try:
python3 -V
# Or, if the "py" command is available, try:
py -V
```

(back to top)

Contact

Kévin Dérécusson - kevin.derecusson@outlook.fr

Project Link: https://github.com/KDerec/bookscrap

(back to top)

This student project is the #1 of my training and you can follow the next one here.