Blind Sentiment Analysis

This is a my final project for ME315 - Machine Learning in Practice at the London School of Economics where I studied abroad in the summer of 2019. Being a social science school, we were encouraged to apply machine learning to a social science problem. I decided to used machine learning to analyis a political enviroment in a language I don't understand. This isn't a translation task. The first part is a simple classification problem. The dataset was just a labeled set of 2000ish arabic language tweets. The novelty of the project was that I was unable to tune parameters manually and had to work somewhat symboically. Its also somewhat intresting because Arabic has many dielects and so a word that means one thing in one dielect means something else in another so processing through that noise was somewhat intresting.

The second part was more of a clustering algorithm. It's a weird embedding approach where I naively basically reinvent the concept before I knew what embedding was. It clusters sort of well. It kind of groups the tweets based on topic and its kind of cool to see how my weird distance metic works. Again, a pretty niave approach but it works for the purpouse of allowing me to analyis a language I dont speak with way more efficiency than would otherwise be possible. Read the pdf or word file for a full write up.