/Data-Science-Notebooks

A collection of data science exercises

Primary LanguageJupyter Notebook

Data-Science-Notebooks

A collection of data science exercises

  • Conducted an exploratory analysis of the relationships between phone name, brand, price, and rating in over 400,000 product reviews from Amazon.com.
  • Trained a random forest classifier on 90,000 reviews to achieve a 85% f1-score predicting positive, negative, or neutral sentiments.
  • A handmade implementation of Logistic Regression using Tensorflow and NumPy.
  • Trained the classifer on a toy moons dataset and visualized its predictions.
  • A handmade implementation of Logistic Regression using NumPy.
  • Implemented an Early Stopping algorithm during training to prevent overfiting and visualized the training and validation set errors over gradient descent iterations.
  • Compared results of batch gradient descent vs. early stopping (virtually the same)
  • Created a text classifer to differentiate spam from ham (i.e. legitimate) emails in the Apache Spam Assassin dataset).
  • Used the 'Bag of Words' method of feature extraction to create a matrix of word frequencies.
  • Scored an accuracy of 99% on the testing set using a Support Vector Machine classifier.
  • By examining the feature importances of a Random Forest Classifier, I was able to discover that the key word feature amongst ham emails in the dataset was the presence of the IMAP web protocol in the "received" field.