Web Scraping JSON Data and Images from HTML: Python Tutorial 📖

This repository contains a Python-based project demonstrating web scraping techniques to extract book data and images from a website. The project is inspired by the tutorial “Web Scraping JSON Data and Images from HTML: Python Tutorial 📖” by Moulaye Eli, published on CamelCodes.net.

📚 Overview

This project focuses on:

Extracting book details and images from the CamelCodes Books Page.
Structuring the extracted data into a JSON format.
Processing and saving images as thumbnails.

The aim is to provide a comprehensive guide to web scraping, including handling both textual and visual data.

🔧 Prerequisites

To run this project, ensure that Python 3 is installed on your system. Additionally, you will need to install the following Python libraries:

beautifulsoup4 for parsing HTML content.
pillow for image processing.
requests for making HTTP requests.

Create a requirements.txt file with the following content:

beautifulsoup4==4.12.3
pillow==10.2.0
requests==2.31.0

Install the dependencies using the following command:

python3 -m pip install -r requirements.txt

🛠️ Project Structure

file_utils.py: Contains utility functions for file operations such as creating directories and saving JSON files.
image_utils.py: Includes functions for downloading, resizing, and saving images.
web_utils.py: Manages web requests and HTML content retrieval.
data_extractors.py: Extracts and processes book information from HTML content.
timing_utils.py: Provides timing utilities to measure script execution duration.
main.py: Orchestrates the web scraping process, including data extraction and image processing.

🚀 Usage

Setup: Ensure the necessary folders for JSON and image storage are created by the script.
Execution: The script will fetch HTML content from the target URL, extract book details, and process images.
Output: The extracted data will be saved in a JSON file, and images will be resized and stored in the designated folder.

To execute the script, run:

python3 main.py

🖥️ Example Output

Upon successful execution, you will find:

A books.json file containing the structured book data.
Resized images saved in the specified directory.

📜 Acknowledgements

This project is inspired by the tutorial Web Scraping JSON Data and Images from HTML: Python Tutorial 📖 by Moulaye Eli.