/analysis

Repo for practical data science problems approaches, including notebook demo and working scripts

Primary LanguageJupyter Notebook

analysis

Build Status PRs Binder Rmotr Colab

As DS application demo part of the "Daas (Data as a service) repo", this repo using jupyter notebook (mainly) as media showing step-by-step analysis and ML/DL approaches on various data science subjects. The idea is : demo how does a data scientist deal with a new dataset, pre-process the data, do exploration analysis (EDA), then running suitable model and offering suggestions with business feasibility and acceptable statistical errors. (i.e. DS workflow : business understanding -> data preprocess -> EDA -> data understanding -> analysis/modeling ). Main focus of this project: 1) Statistics/ML analysis 2) ML theory/algorithms explanation 3) Spark op/ML demo

Main Projects

Machine Learning

Tensorflow Demo

Statistics

Spark

spark op intro

  • Pyspark Basic 1 - Basic spark ops (transform & action): RDD,Map,FlatMap, Reduce,filter, Distinct, Intersection
  • Pyspark Basic 2 -Basic spark ops : load csv,dataframe,SparkSQL, transformation in [RDD, dataframe, SparkSQL]
  • Pyspark Basic 3 -Basic spark ops : Spark DataFrame OP

spark ML intro

spark APP

Other Projects

  • dev

Quick start

Quick_start.md