Project Title: Comparison of Different Machine Learning Algorithms with Sentiment Analysis
Author: Tom Sung
Repository Creation Date: February 17, 2022
This is a self-defined course project relating to Natural Language Processing. The project abstract is as follows.
A comparison of different machine learning algorithms for sentence-level sentiment analysis problem on a Kaggle dataset was performed. Sentiment analysis (SA) is a subfield of Natural Language Processing (NLP) and has been widely used in everyone’s daily lives, ranging from brand watch to social media monitoring. Thus, a model with high accuracy of SA has many important practical applications. In this report, several SA models are tested, including traditional non-neural network-based method and neural-network based methods. In particular, the traditional method that was examined was support vector machine (SVM) and the neural network-based methods include convolutional neural network (CNN), recurrent neural network (RNN), and bidirectional encoder representations from transformers (BERT). The test results from these networks show that performance is limited with neural network models. It was also found that while CNN is good for computer vision applications, it is not good for sentiment analysis. Lastly, for complex models like BERT, a large amount of training time and resources are required to achieve the best performance.
Course: EECE 571T: Advanced Machine Learning Tools (Winter 2021 Term 2)
EECE 571T is a course offered by The Department of Electrical and Computer Engineering at The University of British Columbia (Vancouver, British Columbia, Canada).
The contents of this repository are copyrighted and should be used for reference purposes only. Do not use or copy without permission. In addition, current and future students of this course may not directly use code from this repository to fulfill course assessment requirements. This most certainly violates Academic Integrity Policy Statement outlined by the university.
The purpose of this repository is to save a copy of my coursework for future reference, whether it is for personal use or for career advancement purposes. It is not intended for others to copy or use.
This repository contains three folders: BERT, SVM_CNN, and WordEmbedding. While I wrote BERT and WordEmbedding, the code contained in SVM_CNN is written by my project partner. Respective copyrighted material will belong to them.
In each of the three folders, a Python .py file and a Python Jupyter Notebook .ipynb file can be found. They contain the same code. However, this project is heavily written in Google Colaboratory, which uses the Jupyter Notebook format. The .ipynb file may be easier to read.
At the time of writing, the code is not guaranteed to work, as some lines in the code contain URLs that may or may not still exist. For example, the link to the dataset for training, validating, and testing may not work. However, the code should provide a good foundation for those who want an idea on where to start.