Using yfinance, we grab minute-by-minute BTC-USD data, dump it into PostgreSQL, and link Excel via ODBC for quick analysis!
Airflow
Docker Desktop
VSCode
Excel
DBeaver
ODBC Driver PostgreSQL
- To be able to connect to a PostgreSQL database, this video explains a step by step process on how to establish a connection on the Web UI and using
DBeaver
. - I strongly recommend that you learn how to build/extend the Airflow Docker image to download the right modules and dependencies (if you don't know how to do so). Here's a link to a YouTube video I followed, which worked really well: Link to the video.
- I've attached the Dockerfile with the bash scripts, along with the requirements.txt, to install the correct dependencies and modules. You can then build/extend your Docker image.
- In your
Docker-Compose.yaml
file, I personally used the default PostgreSQL config provided. If you don't have this in the 'services' section of the YAML file, please include it.
If you have Airflow set up with the correct image extension and all its dependencies, I will show you how to set up the PostgreSQL ODBC driver on your Windows computer.
-
First, verify the host machine firewall settings if they're allowing connections: 🌎🧱
- Go to
Control-Panel
. - Click on
System and Security
. - Find and Click
Windows Defender Firewall
. - Click on
Allow an app or feature through Windows Defender Firewall
. - Look for
Docker Desktop Backend
and ensure there are two rows each ticked for the Private and Public networks.
- Go to
-
Second, create an inbound rule: 🌎🛡️
- Go to
Control-Panel
. - Click on
System and Security
. - Find and Click
Windows Defender Firewall
. - Click on
Advanced Settings
➡️Inbound Rule
➡️New Rule
- Choose
Port
and clickNext
- Select
TCP
and specify port5432
- Allow the connection and click
Next
- Select when this rule applies (I selected all three
Domain
,Private
,Public
) - Name the rule whatever you want hit finish.
- Go to
-
Third, once you've downloaded the ODBC Driver for postgreSQL, you need to configure it: 🚗
- On Windows, type "ODBC" in the
Start Menu
. - Select whichever bit version your Excel runs on (
32 bit
or64 bit
). - Navigate to ➡️
User DSN
. - Click on
Add
. - Select the
PostgreSQL Unicode
driver. - Click on
Finish
.
- On Windows, type "ODBC" in the
-
Third, you want to configure it: 📝
- Name it however you want in the
data source
section. - You can input an optional
description
if you'd like. - Enter the
database name
you are connecting to in your airflow connection (in my case it's called 'test)'. - Enter the
server name
, if it's on your local machine it's justlocalhost
. - Enter 5432 for your
Port
. - Enter the username and password for the postgreSQL login (this is shown in the
Docker-compose.yaml
file under the postgreSQL services). - Click the drop-down menu on
SSL Mode
and select 'prefer'. - CLick on
Test
to ensure the connection works.
- Name it however you want in the
I'll explain some important parts of the code that I think need attention.
-
For the python DAG script:
- Notice in the DAG, in order to establish a succesful connection to the database, you must use a
PostgresHook(postgres_conn_id='name_of_connection_defined_in_WebUI')
. - You have to then use the
.get_conn()
method followed by the.cursor()
method in order to do any kind of data modificaitons. - It is imperative to then use the
.executemany(sql_script, dataframe.values.tolist())
method to execute your query. - If you want to see those changes being made, then you will see the I had to use the
.commit()
method. - You have to close the connection and the cursor with the
.close()
method once you're done. However, if you have another task that needs to run more SQL queries, you shouldn't close the connection yet. - In the SQL code, notice the section 'ON CONFLICT (timestamp) DO NOTHING.' This ensures that existing timestamps in the table will not be overridden, and only new timestamp values will be added.
- Notice in the DAG, in order to establish a succesful connection to the database, you must use a
-
For the Power Query M script:
- It's similar to my previous mini project where I extracted values from Excel workbooks in order to feed it into a SQL code.
- Except we're not reading the values from the workbook, we're simply going to fill those values in.
- This is the format you want it in: Odbc.Query("Driver={PostgreSQL Unicode};Server=localhost;Database=test;","SELECT * FROM table_name")
- You can find more info about this function by navigating to this Link.