/ID2223_Scalable_Machine_Learning_and_Deep_Learning

labs and a final project of course ID2223 "Scalable Machine Learning and Deep Learning" at KTH University.

Primary LanguageJupyter Notebook

ID2223_Scalable_Machine_Learning_and_Deep_Learning

Labs and Final Project of KTH course ID2223 "Scalable Machine Learning and Deep Learning."

Lab 1: Data Mining in Financial Datasets

  • In the first part of this lab, I built the pipelines to pre-process the data (cleaning, scaling, feature engineering, etc.) using the cluster-computing framework Apache Spark in Scala. I built machine learning/statistical models based on the pre-processed dataset to predict housing prices using four different regression models(linear/ decision tree/ random forest/ gradient-booster forest regressions). In the second part, I carried out an exploratory analysis of credit card clients' attributes, pre-processed the data, and trained binary classification models to predict the risk of "default" of these clients using Spark. I also compared classifiers' performance based on different models, i.e., logistic regression, decision tree, and random forest tree, and adopted cross-validation based hyperparameter tuning to improve the model.
  • Achieved top score of analysis result and models' performance in the entire class.

Lab 2

  • In this lab assignment, I built an image classification model with Inception-v1 Convolutional Neural Network (CNN) by using frameworks TensorFlow/Keras.

Final Project: Toxic Comments Detector

  • I developed a toxicity model with LSTM Recurrent Neural Network (RNN) to detect toxic online comments using TensorFlow/Keras. Besides, this model minimizes unintended bias concerning mentions of identities that are frequently attacked in online communities.

How to use

Clone the repository, including data files, and run the Jupyter notebook file (.ipynb) in each directory. (need to install the library dependencies listed in the top cells in each notebook beforehand)