/cmuDSC

Primary LanguageTeXMIT LicenseMIT

Third Place repo for CMU data science cup 2016

Directory

  • data: the raw data and the intermediate datasets
  • models: where training takes place
  • report: the two-page report and its tex source
  • scripts: the processing scripts
  • visu: some exploratory visualizations done before hand

Data Cleaning

  • Removed rows in the data that had quantity 0
  • Normalized days to begin at 0

Feature Engineering

Insert columns for:

  • Price per Product
  • Number of Eggs

Dependencies

Python

  • Numpy
  • Pandas
  • Matplotlib
  • Sklearn
  • Patsy R
  • RCurl