Raw data files are stored in Raw Data
.
The data file we used is Kickstarter.csv
The raw data files from Raw Data
have been cleaned using clean_data.ipynb
and the cleaned datasets are saved into the Clean_data
folder.
We scraped the extra text features required using these files:
- Scrap Story.ipynb
- Scrape Comments.ipynb
- Scrape FAQ.ipynb
Scraped text data from the datasets in Clean_data
will be uploaded in Output
The outputs are:
- Comments
- FAQ
- Update
Once all relevant data has been scraped, we merged all of them back into one dataset.
This is done with combined_scraped_data.py
, and the output dataset is saved as Combined_dataset.csv
We performed EDA on the dataset with EDA.ipynb
We perform feature preprocesing on the dataset with Features_Preprocessing.ipynb
Our models are:
-
Logistic Regression (
Logreg.ipynb
) -
Decision Tree (
Decision Tree Final.ipynb
) -
Random Forest (
Random_Forest.ipynb
) -
SVM (
TJ_SVM-update.ipynb
) -
Neural Network (
Neural Network.ipynb
)
Models are saved in output files as {model}.pckl
We perform stacking on all our models in Ensemblestacking.ipynb