Examples and illustration of basic statistic concepts, probability distribution, Monte Carlo simulation, preprocessing and visualization techniques, and statistical testing.
This repo is divided into multiple sections. Each section focuses on a group of concept, operation, or data science toolkit. A brief summary is presented below (in alphabetical order):
- algorithm: rudimentary algorithm and data structure, using LeetCode problems for practice. Solutions are provided and commented. Problems are grouped by subjects (BFS, DFS, tree etc.).
- case_study: small datasets that exemplifies regression/classification workflow, from data cleaning to feature engineering, modeling, training, evaluation etc.
- cheatsheet: useful derivations of commonly used formula, for future review.
- data_structure: standalone data structure overview in Python (union-find, trie etc.).
- distribution: common probability distribution, PDF, CDF, simulation methods.
- handy_syntax: Python tricks that are often useful but hard to memorize.
- keras: high level tensorflow API. Basic use cases.
- models: more in-depth study of specific ML models.
- preprocessing: data cleaning, smoothing, pipelining etc.
- simulation: interesting simulation experiments for brain exercise.
- statistic_test: a catalogue of commonly used statistical tests and implementation in Python.
- tensorflow: plain tensorflow for deep learning tasks.
- training: tricks to speed up training tensorflow models.
- unittest: standard protocols for Python unit tests.
- unix: keeping track of Python environments and for easy restoration.
- visualization: example use case of matplotlib, plotly, ggplot, and animation.