Aim to build an easy-manageable ETL pipeline, including the jobs to extract weather data from WeatherAPI.com load into PostgreSQL, and transform them as the requirements need.
The goal is to create an easy-to-view table containing dates and daily average
temperatures from my current living city, Berlin (the location can be changed in Makefile
).
- Apply an API Key from RapidAPI
- Python: 3.7.4
- virtualenv
- GNU Make
- PostgreSQL: 12
sudo apt install virtualenv
virtualenv -p python3 venv
source venv/bin/activate
git clone https://github.com/samuelTyh/ETLdemo.git /to/your/working/directory
cd /to/your/working/directory
Create config.cfg
and fill in your API Key, DB's configuration, etc.
[weather-api]
API_KEY=<YOUR-RAPIDAPI-KEY>
[DB]
HOST=
DB_NAME=
DB_USER=
DB_PASSWORD=
DB_PORT=
YOUR_CITY=<CITY-COUNTRY-OR-REGION-NAME>
make install # install dependencies
make pull # pull the data from API
make run # run ETL jobs
make check
- Set up cron job to pull data routinely (apache airflow)
- try-catch error
- Dockerize and build test database
- monitoring (apache airflow)
- testing