All the code for the Machine Learning I-II assignment (MA DTA 2021).
Use famous speeches delivered by US Presidents and train 2 machine learning and 2 deep learning models to classify them in the political (Democratic or Republican) party of the speaker.
Every ML/DL task starts with getting familiar with the data that are going to be passed into the pipeline(s). In this case, a corpus with 878 speeches by either a Democratic or a Republican POTUS is gathered. The original corpus is available on Kaggle. Let's see some graphs that will let us get a basic understanding of the data at our displosal.
Speeches per party:
Speeches per president:
Distribution of total words:
The objective is to train the models to predict the political affiliation (party) of the speaker.
- Logistic Regression (LR)
- Support Vector Machines (SVM)
- Convolutional Neural Network (CNN)
- Long Short-Term Memory Recurrent Neural Network (LSTM-RNN)
In both cases of the deep learning algorithms, GloVe pre-trained word embeddings were used.
See the short Report explaining briefly the rationale behind each model and the results.