/Amazon-scraping.PY

Primary LanguagePythonMIT LicenseMIT

Hello πŸ‘‹, I'm Ryo

An independent backend developer

Welcome To Amazon-Scraping.PYπŸ›οΈ

This program is my training project for scraping data from Amazone's website

Feature

  • The program enables to retrieve all data with electronic categories
  • use logs from the logging library so that it can store request logs and response status from the API
  • Use function retry to handle when getting a request timeout error
  • Use of fake_useragent libraries to avoid 503 or full traffic response codes
  • Use of PyQuery library for HTML parsing so that the use of selectors becomes easier
  • Using the Icecream library makes it easier to debuggin

Tech

  • requests is an easy-to-use Python library for interacting with APIs and making HTTP requests
  • pyquery is a Python library that allows HTML and XML manipulation with a syntax similar to jQuery
  • fake_useragent is a Python library that provides an easy way to generate fake user-agent strings for HTTP requests
  • icecream is a Python library that provides a simple and informative way to log code, helping with monitoring program execution flows.

Requirement

Installation

To run this program you need to install some libraries with the command

pip install pyquery icecream fake_useragent

Example Usage

# Clone this repositories
git clone https://github.com/ryosoraa/Amazon-scraping.PY.git

# go into the directory
cd Amazon-scraping.PY

# Run code
python main.py

πŸš€Structure

β”‚   LICENSE
β”‚   main.py
β”‚   README.md
β”‚
β”œβ”€β”€β”€data
β”‚   β”œβ”€β”€β”€Camera_&_Photo
β”‚   β”‚   β”œβ”€β”€β”€all
β”‚   β”‚   └───page
β”‚   β”‚
β”‚   β”œβ”€β”€β”€Electronics_Accessories_&_Supplies
β”‚   β”‚   β”œβ”€β”€β”€all
β”‚   β”‚   └───page
β”‚   β”‚
β”‚   └───Results
β”œβ”€β”€β”€libs
β”‚   β”‚   __init__.py
β”‚   β”‚
β”‚   β”œβ”€β”€β”€service
β”‚   β”‚   β”‚   html_parser.py
β”‚   β”‚   β”‚
β”‚   β”‚   └───__pycache__
β”‚   β”‚           html_parser.cpython-312.pyc
β”‚   β”‚
β”‚   β”œβ”€β”€β”€utils
β”‚   β”‚   β”‚   logs.py
β”‚   β”‚   β”‚   parser.py
β”‚   β”‚   β”‚   writer.py
β”‚   β”‚   β”‚
β”‚   β”‚   └───__pycache__
β”‚   β”‚
β”‚   └───__pycache__
β”‚
└───logs
       logging.log

Author

πŸ‘€ Rio Dwi Saputra

Ryo's LinkedIn Ryo's Instagram