/elt-brasileirao

Primary LanguagePythonApache License 2.0Apache-2.0

Scraper Brasileirao

Docker Prefect Clickhouse PySpark Pandas


Project Organization

├── README.md
├── docker                  <- The top-level README for developers using this project
├── clickhouse              <- Clickhouse artifacts and DB
├── minio
|   ├── bucket_setup.sh     <- Script to setup storage instance
│   └── data                <- MinIO artifacts and DB
├── prefect
|   ├── create_blocks.py    <- Script to store credentials in prefect blocks
|   ├── prefect.yaml        <- Config file to register prefect workflow
│   └── database            <- Prefect artifacts and DB
├── scraper                 <- Workflow code
|      ├── workflows
│           ├── src         <- Source code modules
│           └── elt.py      <- Main reference file for the workflow
│
└── build.sh                <- Script to build containers and setup services

Launch the Project

sh build.sh