
Machine Learning Projects

Primary LanguageJupyter Notebook

List of my Projects

  1. Text mining with Amazon’s book review data
    • Scraped Amazon’s book price and reviews. Used tidytext package in R to remove stop words and tokenlize documents. Created word clouds and network graphs to display the most frequent words and connections between words in different type of reviews. • Performed sentiment analysis using Google’s Natural Language API. Built a non-linear model to predict rating for unrated new books with MSE of 0.5.

  2. Population clustering using CDC’s 2016 Annual Survey Data
    • Used ggplot for data visualization and discovered prevalent epidemic diseases in the states. • Performed PCA for dimension reduction and K-Means algorithm for clustering. Generated clusters in the population with distinct health conditions and found correlations between behaviors and chronic diseases.

  3. Bad Loan Prediction with Lending Club’s data
    • Used R for data cleaning, missing-data imputation and data transformation. Using H2O package, applied Neural Network, Random Forest, Naïve Bayes and other algorithms to predict bad loans. • Compared and calculated variable importance for each model, the best model achieved accuracy of 67.8%.

  4. Clustering svm;kmeans
    • Use SVM and K-means algorithm to create clusters with iris dataset.