/k9

Self-Taught Data Science

Primary LanguageHTML

Self-Taught Data Science Playground

The repository is a collection of my self-taught notebooks for data science theories and practices.

Here to visit my web site Hello, Data Science! hosting all the notebooks in nicely rendered HTML.

Notebooks Summary

notebooks/

A notebook is written in either Jupyter or R markdown. The major programming languages used for most of the notebooks are Python and/or R. You may find me sometimes inter-operate the two langauges in a single notebook. This is achieved thanks to reticulate.

Laboratory Scripts

labs/

These are quick-and-dirty scripts to explore a variety of open source machine learning tools. They may not be completed and can be messy to read.

[Optional] Setup Python Environment

To ensure reproducibility it is recommended to use pyenv along with pyenv-virtualenv to control both Python and package version.

pyenv support only Linux and macOS. For Windows user it is recommended to use conda instead.

Install Different Python Version

To use virtualenv with reticulate in Rmd, the involved Python must be installed with shared library:

PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.7.0

Create virtualenv

Each notebook has different package dependencies. Here is an example to create an environment specific for the notebook on model explainability:

cd notebooks/ml/model_explain
pyenv virtualenv 3.7.0 k9-model-explain
pyenv local k9-model-explain
pip install --upgrade pip
pip install -r requirements.txt

TODO

Topics

  • Machine Learning
    • Factorization Machines
    • Gradient Boosting Trees
    • Recurrent Neural Nets
    • Sequence-to-Sequence Models
    • GANs
    • Reinforcement Learning Basics
  • Statistics
    • Linear and Logistic Models: Econometrics v.s. Machine Learning
    • Naive Bayes
    • Bootstrap Sampling
    • Bayesian Model Diagnostic
  • Tools/Programming
    • TensorFlow 2.0 Hands-On
    • MXNet Hands-On
    • RASA Chatbot Framework Hands-On
  • Programming
    • R
      • Production Quality Shiny App Development
    • Python
      • Dash for Interactive Dashboarding
  • Projects
    • Model Deployment with gRRC

Site

  • Dockerize each notebook (for complete reproducibility and portability)?
  • Tidy up dependencies for each notebook