ScrapyPriceMonitoring

This README provides a clear and detailed overview of the project, including the architecture, directory structure, installation and usage instructions, as well as specific modules for data extraction, transformation, and visualization.

A Python ETL for Price Monitoring

Python solution for pricing strategies. We have a pipeline and an ETL in Python that collects, consolidates, and generates insights about a specific category of products. The project will navigate to the website defined by you and extract data from dozens or hundreds of pages containing price information, titles, descriptions, and ratings. The data will be transformed using Pandas. A table will be assembled in a PostgreSQL database. Insights and dashboards will be generated automatically.

Architecture

A Python ETL for Web Scraping

Extraction - Scrapy
Transformation and Load - Pandas
Dashboard - Streamlit
Database - PostgreSQL

Diagram

Directory Structure

ScrapyPriceMonitoring/
├── scrapy_monitoring/
│   ├── spiders/
│   │   └── price_spider.py
│   ├── pipelines.py
│   ├── items.py
│   ├── settings.py
├── transformation/
│   ├── transform.py
├── dashboard/
│   ├── app.py
├── requirements.txt
└── README.md

Documentation

Github Pages

How to Use

To run web scraping:

scrapy crawl mercadolivre -o ../../data/data.jsonl

To run PANDAS, navigate to the SRC folder:

python transformation/main.py

To run Streamlit and build dashboards:

streamlit run app.py

Modules

Extraction

Quick Starter

Read documentation - https://scrapy.org/

This concludes the translated README for your ScrapyPriceMonitoring project. Let me know if there's anything else you need!

GustavoSantanaData/ETL-For-Price-Monitoring