/Crowdfunding-ETL

Extract, Transform and Load data using Python, Pandas, pgAdmin and jupyter notebook

Primary LanguageJupyter Notebook

Crowdfunding-ETL

In this hypothetical exercise, a crowdfunding platform keeps their campaign, contacts and donor data in one large Excel file. They have asked for help transforming the data into a clean, easy to use database for more accurate analysis.

Overview

In this exercise, I will apply the extract, transform, and load (ETL) process in order to collect, clean, and store data in a database prior to performing analysis. I will use Python and Pandas to perform the extract and transform steps. Then, I'll create a PostgreSQL database and table schemas by using an entity relationship diagram (ERD), and load the data into the database. Finally, I will use SQL to perform data analysis.

Resources and Steps

  • Create an ETL pipeline that moves raw data to a SQL database.
  • Extract data from an external file by using Python and Pandas.
  • Use regular expressions to extract numbers and text.
  • Clean and transform data by using Python and Pandas.
  • Design a database and a table schema by using an entity relationship diagram (ERD).
  • Load data into a PostgreSQL database.
  • Perform data analysis by using SQL queries.

Original project data is contained in a large Excel file. It is becoming too large for the company to manage successfully. Therefore, I will use the ETL process to extract useful data from the Excel file into four separate .csv files. These .csv files will be the basis for a PostgreSQL database from which the data analytics team at the company will be able to perform analysis and create reports for company stakeholders as well as individuals who donate to projects.

Results

Recommendations