Organized common data science/analysis codes, beginner friendly, including:
- Exploratory Data Analysis (Basic info, distribution, correlation, visualization, etc)
- Dealing with time series data
- Interactual visualizaiton with Plotly
- Using PySpark to deal with big data
- Data preprocessing (MinMaxScaler|to be covered: Normalizer, StandardScaler, Log)
- Model training - feature selection
- Model training - unsupervised learning(K-means))
- Model training - classification tasks(SVM, MLP, XGBoost, CatBoost, Logistic Regression, LightGBM, Random Forest), Model evaluation(Confusion matrix, ROC, accuracy, precision, recall, F1 score)
- Model training - Regression tasks(LightGBM), Model evaluation(MAE, MSE, RMSE, R2)