Data-Science-Notebooks

A collection of data science exercises

Amazon Unlocked Phone Analysis

Conducted an exploratory analysis of the relationships between phone name, brand, price, and rating in over 400,000 product reviews from Amazon.com.
Trained a random forest classifier on 90,000 reviews to achieve a 85% f1-score predicting positive, negative, or neutral sentiments.

A handmade implementation of Logistic Regression using NumPy.
Implemented an Early Stopping algorithm during training to prevent overfiting and visualized the training and validation set errors over gradient descent iterations.
Compared results of batch gradient descent vs. early stopping (virtually the same)

Created a text classifer to differentiate spam from ham (i.e. legitimate) emails in the Apache Spam Assassin dataset).
Used the 'Bag of Words' method of feature extraction to create a matrix of word frequencies.
Scored an accuracy of 99% on the testing set using a Support Vector Machine classifier.
By examining the feature importances of a Random Forest Classifier, I was able to discover that the key word feature amongst ham emails in the dataset was the presence of the IMAP web protocol in the "received" field.