/kaggle-titantic

Primary LanguageJupyter Notebook

Kaggle Titantic

Code and writing for submissions made to the Kaggle Titantic competition, as a part of a course project for STAT5302: Applied Regression Analysis.

Tasks:

  • Clean the "ticket" column to create a factor for the ticket type and a continuous (?) column for the number
  • Do some kind of feature selection process to include only relevant variables (e.g. via AIC)
  • Do some kind of search for second-order and higher-order interaction effects between variables
  • Search for non-linear relationships between the variables and the response (e.g. polynomial terms, log/power transforms)
  • Consider some kind of feature derived from passenger names
  • Handle missing data in continuous columns
  • Handle missing data in factor columns
  • Do something with the Cabin information
  • Create cross-validation split for use by team (?)
  • Try other model approaches: Decision tree xgboost (Gradient boosted decision tree) Python sklearn for ensemble model Look up how to do ensemble models in R? Neural approach?
  • Write report Abstract Short description of preliminary data study Detailed reasoning of the model/method chosen Explanations of other models and why they were inferior Short conclusion Supplemental document References Code Screenshot of score in leaderboard

Finished tasks:

  • Create Kaggle team
  • Create a github account and "clone" the repository to your local computer