/kickstarter-project

The second project from our Data Science Bootcamp deals with crowdfunding by analyzing the data set of Kickstarter.

Primary LanguageJupyter Notebook

Kickstarter Project

The second project from our Data Science Bootcamp deals with crowdfunding by analyzing the data set of Kickstarter.

The data set contains about 200,000 projects from 22 different countries and from the period from 2009 to 2019.

Kaggle Dataset(s): https://www.kaggle.com/kemical/kickstarter-projects

This project was done with a modified dataset, which can not be uploaded here due to its size. The file "KickstarterData_full" contains all the used data, after the cleaning of the original files.


Goal: How to raise money with crowdfunding?

  • Recommendations and Insights for crowdfunding projects
  • Predicting the success of a project → Chances

Business questions:

  • What does the average project on Kickstarter looks like?
  • What can you expect with a specific project?
  • Which factors are important for success?
  • Which machine learning model is the best to predict the success?

Recommendations:


Model results:

We tried the following models: KNN, Decision Tree, Logistic Regression, RandomForest

Score KNN DT LR RF
Recall 1 0.88 0.51 0.87
AUC 0.58 0.67 0.65 0.7

→ RandomForest is our best model

  • We try to minimize False-Negatives (high recall)
  • High recall is easily achieved, so we also need a metric to balance it out
  • For this we are using the area under curve for precision-recall
  • Problem: We don't know which projects have done how much marketing for their campaign. But this could be a very important success factor.

In this repository you can find the following files:

  • Jupyter Notebook(s)
  • Presentation
  • Dataset(s)

Requirements

  • Matplotlib
  • Sklearn
  • Pandas
  • Numpy
  • Seaborn

Steps

  1. Imports
  2. Data Overview
  3. Data Cleaning
  4. Exploratory Data Analysis
  5. Data Preparation
  6. Machine Learning Models