This README provides a clear and detailed overview of the project, including the architecture, directory structure, installation and usage instructions, as well as specific modules for data extraction, transformation, and visualization.
A Python ETL for Price Monitoring
Python solution for pricing strategies. We have a pipeline and an ETL in Python that collects, consolidates, and generates insights about a specific category of products. The project will navigate to the website defined by you and extract data from dozens or hundreds of pages containing price information, titles, descriptions, and ratings. The data will be transformed using Pandas. A table will be assembled in a PostgreSQL database. Insights and dashboards will be generated automatically.
A Python ETL for Web Scraping
- Extraction - Scrapy
- Transformation and Load - Pandas
- Dashboard - Streamlit
- Database - PostgreSQL
ScrapyPriceMonitoring/
├── scrapy_monitoring/
│ ├── spiders/
│ │ └── price_spider.py
│ ├── pipelines.py
│ ├── items.py
│ ├── settings.py
├── transformation/
│ ├── transform.py
├── dashboard/
│ ├── app.py
├── requirements.txt
└── README.md
To run web scraping:
scrapy crawl mercadolivre -o ../../data/data.jsonl
To run PANDAS, navigate to the SRC folder:
python transformation/main.py
To run Streamlit and build dashboards:
streamlit run app.py
Quick Starter
Read documentation - https://scrapy.org/
This concludes the translated README for your ScrapyPriceMonitoring project. Let me know if there's anything else you need!