This portfolio is a compilation of notebooks which I have built for various personal projects and experiments. I've split up all of the projects based on solution categories.
Professional projects and accomplishments can be found on my LinkedIn.
Automatic Essay Grading Tool
Developed an Automatic Essay Grading tool in Python that utilizes TensorFlow to build a Four-Layer feed-forward neural network. Additionally, by utilizing the Natural Language Tool Kit (NLTK) package, I built a regression model that incorporates features such as grammar and sentiment to grade test essay sets. A Spearman's Correlation of approximately 0.90 was achieved. The code can be found here.
IMDB Sentiment Analysis
Leveraged the IDMB Sentiment dataset of 50,000 movie reviews and their corresponding sentiments (1,0) to build a sentiment analysis model. Throughout the process, I compared three vectorization techniques:
- TF-IDF
- Word2Vec
- Universal Sentence Embeddings
The main motive was to compare vectorization techniques so I just used Logistic Regression for my model. Unsurprisingly, we found that the Universal Sentence Embeddings performed the best with an F1 score of #.##. The code can be found here.
Comparing Various Extractive Text Summarization Techniques
Developed a notebook comparing the approach and results of three extractive text summarization techniques:
- Sentence scoring based on word frequency
- TextRank using Universal Sentence Encoder
- Unsupervised Learning using Skip-Thought Vectors
Results showed that approach three did the best job of extracting the key themes within the passage. I've published these findings in a "Towards Data Science" Medium article. The notebook can be found here.
Presidential Inauguration Sentiment Comparison
Utilized NTLK and TextBlob packages to analyze the sentence by sentence sentiment of Obama and Trump's Inauguration Speeches. Using GGPlot, this sentiment was visualized to help analyze the Presidents’ emotion as each speech progressed. This analysis was done in Python. The code can be found here. Here is the final output visualization:
Emotion Detection
Developed a Convolutional Neural Network to analyze the emotion (7 emotional states) in a specific greyscale static image. Utilized Tensorflow to construct the Neural Network and visualized its performance and architecture using Tensorboard. The code can be found here
How well would Messi do in a different league?
Developed a negative binomial model to predict the number of goal Lionel Messi would score in a different league. Only leveraged defense as a predictor and plan to incorporate weather and historical record against a specific team. Coded in R and can be found here. Blog post can be found here.
Exploring NYC Taxi Trip Data
Developed a EDA kernel for the "New York City Taxi Trip Duration" kaggle competition. Data consisted of start and end coordinates for taxi trips along with their corresponding times. Kernel can be found here