Data Bootcamp week 8 - ETL
This repository includes guided homework exercises and an independent challenge exercise focused on the extract, transform, and load process. The exercise set up a scenario in which a crowdfunding platform was transitioning from tracking data in Excel to using a PostgreSQL database.
The homework portion of the project involved using Python and Pandas to extract data from .csv and .xlsx files and to transform that data before loading it into a PostgreSQL database. An entity relationship diagram (ERD) was created with QuickDBD to depict the database tables and their relationships. SQL queries were written to answer questions about the data. Query results were saved and exported in .csv format.
For the independent challenge portion of the exercise, an additional .csv file was provided, with information about the backers of specific crowdfunding campaigns. The data was presented in a single column of that .csv file, and so the extraction process included steps to break it into several different columns. The images below show how the data looked before and after.
The initial ERD was modified to include a table for information about campaign backers, as shown here:
I wrote queries as specified in the challenge, which are saved here