The repository is a collection of my self-taught notebooks for data science theories and practices.
Here to visit my web site Hello, Data Science! hosting all the notebooks in nicely rendered HTML.
notebooks/
A notebook is written in either Jupyter or R markdown.
The major programming languages used for most of the notebooks are Python and/or R.
You may find me sometimes inter-operate the two langauges in a single notebook.
This is achieved thanks to reticulate
.
- Statistics
- Machine Learning
- Natural Language Understanding
- On Subword Units
- Contex-Free Word Embeddings (W.I.P.)
- [Contex-Aware Word Embeddings]
- Programming
- Projects
- YouTube-8M Multi-Label Video Classification
- [A General-Purpose Neural Ranking Model (W.I.P.)]
labs/
These are quick-and-dirty scripts to explore a variety of open source machine learning tools. They may not be completed and can be messy to read.
To ensure reproducibility it is recommended to use pyenv
along with pyenv-virtualenv
to control both Python and package version.
pyenv
support only Linux and macOS.
For Windows user it is recommended to use conda
instead.
To use virtualenv
with reticulate
in Rmd,
the involved Python must be installed with shared library:
PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.7.0
Each notebook has different package dependencies. Here is an example to create an environment specific for the notebook on model explainability:
cd notebooks/ml/model_explain
pyenv virtualenv 3.7.0 k9-model-explain
pyenv local k9-model-explain
pip install --upgrade pip
pip install -r requirements.txt
- Machine Learning
- Factorization Machines
- Gradient Boosting Trees
- Recurrent Neural Nets
- Sequence-to-Sequence Models
- GANs
- Reinforcement Learning Basics
- Statistics
- Linear and Logistic Models: Econometrics v.s. Machine Learning
- Naive Bayes
- Bootstrap Sampling
- Bayesian Model Diagnostic
- Tools/Programming
- TensorFlow 2.0 Hands-On
- MXNet Hands-On
- RASA Chatbot Framework Hands-On
- Programming
- R
- Production Quality Shiny App Development
- Python
- Dash for Interactive Dashboarding
- R
- Projects
- Model Deployment with gRRC
- Dockerize each notebook (for complete reproducibility and portability)?
- Tidy up dependencies for each notebook