/DSx

Hands on tutorials demonstrating the concepts of Prediction engineering, Feature engineering and automation in data science.

Primary LanguageJupyter Notebook

DSx

Hands on tutorials demonstrating the concepts of Prediction engineering, Feature engineering and automation in data science. In a series of notebooks, we show how we can build predictive models from raw data within a day - all using open source software.

Open source tools used

  • pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Featuretools is a DARPA sponsored open source software that enables data scientists to automatically extract features from time varying temporal data.
  • scikit-learn is a free software machine learning library for the Python programming language.

Concepts to learn

  • Prediction engineering
  • Feature engineering

Notebooks

  • NYC-Taxi-Dataset -Learn feature engineering
  • Retail-Dataset - Learn prediction engineering

Installation

Linux

sh install_linux.sh
source venv/bin/activate
pip install -r requirements.txt
jupyter notebook

Mac

sh install_osx.sh
source venv/bin/activate
pip install -r requirements.txt
jupyter notebook