The raw data file kickstarter_data_raw.json
was downloaded from: https://webrobots.io/kickstarter-datasets/
The raw file is a 1GB+ and contains just over 200,000 lines, each with its own json object representing a webscrabe request.
The cleaned data file kickstarter_data_final.csv
is the process of running all step of clean_data.py
on the raw dataset.
python clean_data.py 1
executes step 1 and produces kickstarter_data_transformed.csv
. This step translates the raw data into a csv file and removes scraping metadata. This step is quite slow and may take 1-2 hours to execute.
python clean_data.py 2
executes step 2 and produces kickstarter_data_transformed.csv
. This step which drops all obviously useless columns.
python clean_data.py 3
executes step 3 and produces kickstarter_data_pruned.csv
. This step generates new potentially useful columns based on existing columns.
python clean_data.py 4
executes step 4 and produces kickstarter_data_final.csv
. This step further drops irrelevant columns and removes rows with missing data (~1,000).