/finance-data-builder

Finance 🏦 Data Builder 🛠️ @ postgres 🐘

Primary LanguagePythonApache License 2.0Apache-2.0

finance-data-builder

Finance 🏦 Data Builder 🛠️ @ postgres 🐘

The finance data builder extracts data from several sources, loads it into a postgres database and transforms it via dbt into beautiful models.

The data sources are:

What it is

Airflow

I use Airflow to manage the whole ELT process:

For Google News:

airflow graph google news

For yahoo! finance:

airflow graph yahoo

For PayPal:

airflow graph paypal

DBT

I use DBT to transform the data into models:

dbt graph

Get started

Prerequisites

Setup

To run this project, simply add a .env file to the project root directory, fill it with the following environment variables:

DBT_POSTGRES_HOST=fdb_dbt_db
DBT_POSTGRES_USER=dbt
DBT_POSTGRES_PASSWORD=dbt
DBT_POSTGRES_DB=dbt
DBT_POSTGRES_PORT=5432

AIRFLOW_POSTGRES_HOST=fdb_airflow_db
AIRFLOW_POSTGRES_USER=airflow
AIRFLOW_POSTGRES_PASSWORD=airflow
AIRFLOW_POSTGRES_DB=airflow
AIRFLOW_POSTGRES_PORT=5432

AIRFLOW_USER=airflow
AIRFLOW_PASSWORD=airflow

and then run it via docker-compose:

docker-compose up -d

NOTE: To retrieve PayPal data you must authenticate. First create a PayPal App with LIVE APP SETTINGS Transaction Search enabled and then add an Airflow connection with the following information:

Conn Id: http_paypal
Conn Type: HTTP
Host: https://api.paypal.com
Login: <enter-your-CLIENT-ID-here>
Password: <enter-your-SECRET-here>

You should then be able to retrieve your personal PayPal transactions.

Notes

I am using a storage folder for storing data files locally. Normally you probably want the storage to be a remote storage that is designed to store large amount of data, such as S3, GCP or Blob Storage.