/Data_Science_Project_template

Organized common data science/analysis codes

Primary LanguageJupyter Notebook

Data_Science_Project_template

Organized common data science/analysis codes, beginner friendly, including:

  • Exploratory Data Analysis (Basic info, distribution, correlation, visualization, etc)
  • Dealing with time series data
  • Interactual visualizaiton with Plotly
  • Using PySpark to deal with big data
  • Data preprocessing (MinMaxScaler|to be covered: Normalizer, StandardScaler, Log)
  • Model training - feature selection
  • Model training - unsupervised learning(K-means))
  • Model training - classification tasks(SVM, MLP, XGBoost, CatBoost, Logistic Regression, LightGBM, Random Forest), Model evaluation(Confusion matrix, ROC, accuracy, precision, recall, F1 score)
  • Model training - Regression tasks(LightGBM), Model evaluation(MAE, MSE, RMSE, R2)