/mids-w207-final-project

Clear Cut Solution - https://www.kaggle.com/c/forest-cover-type-prediction

Primary LanguageJupyter Notebook

mids-w207-final-project

Primary Files:

  1. exploratory_data_analysis.ipynb: Jupyter notebook with a detailed analysis of the training data
  2. feature_engineering.py: Python library containing all transformations
  3. models.py: Python library containing all models and configurations
  4. clear_cut_solution.ipynb: Jupyter notebook with descriptions, solutions and test results

Repo Map

  • README.md
    • Project introduction, file structure, environment instructions
  • exploratory_data_analysis.ipynb
    • Distributions, visualizations, sanity checks, correlation etc.
  • clear_cut_solution.ipynb
    • Formal project implementation with feature engineering, training, evaluation and testing
  • feature_engineering.py and models.py
    • Libraries of functions for feature engineering and models Consumed in clear_cut_solution.py and also contains experimental code not included in final project.
  • ./data
    • Notebook diagrams, training data, testing data
  • ./submissions
    • Test output files (csv) to be uploaded on Kaggle
  • ./backups
    • Html, markdown, and python versions of the clear_cut_solution notebook
  • ./comp_setup
    • Details of custom container creation

Computing Environment

Work was conducted in the kmartcontainers/207final container (Dockerhub link). It is a custom container which adds the xgboost library to the jupyter/tensorflow-notebook docker container as put together by the jupyter development team. Details of how to set up the container to run on your machine or GCP as well as details of the container creation are in the comp_setup/ComputeSetup.md file.