Table of Contents
- Configure a Python environnement.
- Apply the basics of Python programming.
- Use Git and GitHub version control systeme.
- Manage data with the ETL (Extract-Transform-Load) process.
- Use of BeautifulSoup, request and csv libraries.
I am a marketing analyst at Books Online, a large online bookstore specializing in used books.
As part of my job, I try to manually 😔 track used book prices on competitors' websites, but it's too much work.
My team and I decided to automate this laborious task with a program (a scraper) 💡 developed in Python, which is able to extract pricing information from other online bookstores.
Sam, my team leader, asked me to develop a beta version of this system to track book prices at Books to Scrape, an online book retailer.
In this beta version, the program will simply be an on-demand executable application aimed at retrieving prices at the time of its execution.
Books to Scrape library is composed of categories and categories are composed of books.
For each categories, a csv file is created at data/csv/category_name.csv with the following informations of each books:
- product_page_url
- universal_ product_code
- title
- price_including_tax
- price_excluding_tax
- number_available
- product_description
- category
- review_rating
- image_url
-
Clone the project in desired directory ;
git clone https://github.com/KDerec/bookscrap.git
-
Change directory to folder ;
cd path/to/bookscrap
-
Create a virtual environnement (More detail to Creating a virtual environment) ;
- For Windows :
python -m venv env
- For Linux :
python3 -m venv env
- For Windows :
-
Activate the virtual environment ;
- For Windows :
.\env\Scripts\activate
- For Linux :
source env/bin/activate
- For Windows :
-
Install package of requirements.txt ;
pip install -r requirements.txt
-
Run main.py and enjoy !
-
Install Python. If you are using Linux or macOS, it should be available on your system already. If you are a Windows user, you can get an installer from the Python homepage and follow the instructions to install it:
- Go to python.org
- Under the Download section, click the link for Python "3.xxx".
- At the bottom of the page, click the Windows Installer link to download the installer file.
- When it has downloaded, run it.
- On the first installer page, make sure you check the "Add Python 3.xxx to PATH" checkbox.
- Click Install, then click Close when the installation has finished.
-
Open your command prompt (Windows) / terminal (macOS/ Linux). To check if Python is installed, enter the following command (this should return a version number.):
python -V # If the above fails, try: python3 -V # Or, if the "py" command is available, try: py -V
Kévin Dérécusson - kevin.derecusson@outlook.fr
Project Link: https://github.com/KDerec/bookscrap
This student project is the #1 of my training and you can follow the next one here.