/tweet_classification

A comparative analysis between machine learning classification methods in the field of computational social science, utilizing libraries such as pandas, sklearn, pickle, keras, tensorflow, and nltk to build classification models.

Primary LanguageJupyter Notebook

Tweet Classification

A comparative analysis between machine learning classification methods in the field of computational social science. This project was completed during the 2020 election season for the data science course at the University of Notre Dame. Utilized libraries such as pandas, sklearn, pickle, keras, tensorflow, and nltk to build classification models from decision trees, Naïve Bayes, SVMs, and neural networks to predict party affiliation of a given user.

Organization

  • code: Holds python scripts and notebooks to build models and output results. Includes the following nine different machine learning models: ID3 trees, CART Trees, Random Forest Trees, KNN, MLP, SVM, Naive Bayes, Convolutional Neural Networks, and (Distil)BERT.
  • data: Data objects for training and testing, including pickle objects.
  • models: Final models outputted for scripts.
  • *.pk: training and test lemmas and vectorizers.

Links

TPU/BERT Colab Notebook Link: https://colab.research.google.com/drive/1NkGzC5GV8GU1gv5f_1_e5AQAvs_SplvT?usp=sharing

Code + Large Objects here: https://drive.google.com/drive/folders/1JjVmiuu_goXA6pqJu8Cdm6U2yeZGInhw?usp=sharing