/analysis

Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis

Primary LanguageJupyter Notebook

ANALYSIS

As DS application demo part of the "Daas (Data as a service) repo", this repo using jupyter notebook (mainly) as media showing step-by-step analysis and ML/DL approaches on various data science subjects. The idea is : demo how does a data scientist deal with a new dataset, pre-process the data, do exploration analysis (EDA), then running suitable model and offering suggestions with business feasibility and acceptable statistical errors. (i.e. DS workflow : business understanding -> data preprocess -> EDA -> data understanding -> analysis/modeling ). Main focus of this project: 1) Statistics/ML analysis 2) ML theory/algorithms explanation 3) Spark op/ML demo

Quick Start

Quick_start.md

File Structure

├── DE_course       : Code for Udacity data engineer course 
├── DL_             : Deep learning relative projects  
├── DS_algorithms   : Build Data science model from scratch 
├── GPU             : GPU relative code 
├── ML_             : Machine learning relative projects  
├── README.md
├── R_              : R programming language relative projects 
├── SPARK_          : Pyspark basics/op/ML/ETL notebook demo projects
├── Statistics_     : Statistics relative projects 
├── archived        : Archived code/projects 
├── doc             : Doc for quick start, theory paper, pic.. and so on
├── ml_demo.py 
├── notebook        : Jupyter notebook relative projects (nb server/magic..)
├── project         : Archived projects 
├── pytorch_        : Pytorch relative projects 
├── tensorflow_     : Tensorflow relative projects
└── utility         : Utility scripts for ML/DL model tuning, DS plots...

Main Projects

Machine Learning

Tensorflow Demo

Statistics

Spark

spark op intro

  • Pyspark Basic 1 - Basic spark ops (transform & action): RDD,Map,FlatMap, Reduce,filter, Distinct, Intersection
  • Pyspark Basic 2 -Basic spark ops : load csv,dataframe,SparkSQL, transformation in [RDD, dataframe, SparkSQL]
  • Pyspark Basic 3 -Basic spark ops : Spark DataFrame OP

spark ML intro

spark APP

Development

  • dev