Pandas
The script was written using Python 3.7 and Pandas version 0.25.1. There is no need to download the csv files from the source. The script will pull them from the urls provided.
The script can be executed by cloning the repository, navigating to the directory and executing using the python command in your shell.
python3 cons_etl.py
It will save the two output csv files to the working directory.
I made the assumption that the updated_dt
value was refering to the modified_dt
column from the con.csv
file. However, each of the three files had a modified_dt
column that held different dates.
Some of the modified_dt
values precede the created_dt
and will need to be cleaned.