Sports-Bot: Twitter Injury Report Detection

App: Sports-Bot-App
Website: Sports-Bot
Presentation: Presentation

This directory contains the twitter bot project submitted to Unicode Research, an online teaching and research organization which provides classes and competitions for students to engage in.

The goal of this project is to use a model that can classify injury reports in Baseball from Twitter data. The best model found during the 8-week course and competition was the pre-trained RoBERTa Neural Network. The long-term goal of this project is to collect injury data possible for all sports players, classify the type of injury and create a website where this data can be displayed.

Among all applicants, this project won 1st place.

Model	Data Type / Epochs	Sensitivity	Specificity	Precision	F1 Score	Accuracy
kNN	Boolean	0.2734	0.9965	0.9182	0.4214	0.9058
kNN	TF-IDF	0.1957	0.9942	0.8294	0.3167	0.8941
Bernoulli NB	Boolean	0.8614	0.9486	0.7061	0.776	0.9377
Multinomial NB	Count	0.8614	0.9342	0.6525	0.7425	0.9251
Logistic Regression	TF-IDF	0.9373	0.9787	0.8629	0.8986	0.9735
Random Forest	Boolean	0.7631	0.9719	0.7959	0.7792	0.9458
Random Forest	TF-IDF	0.8502	0.9522	0.7184	0.7787	0.9394
SVM	TF-IDF	0.8661	0.9909	0.9315	0.8976	0.9752
LSTM	10	0.9353	0.985	0.9541	0.9466	0.9726
GRU	10	0.9226	0.9864	0.9577	0.9398	0.9705
RoBERTa	5	0.9478	0.9887	0.9664	0.957	0.9782
XLM-RoBERTa	5	0.9648	0.9855	0.9568	0.9608	0.9803
XLNet	5	0.9691	0.9841	0.953	0.9609	0.9803
DistilBERT	5	0.8706	0.9812	0.9393	0.9036	0.9536
DistilBERT FT	5	0.8861	0.9789	0.9333	0.9091	0.9557

The best in each category is bolded, but for our purposed our most important metric is specificity.

The score most valued for our use case was the sensitivity, so we label the best "classical" machine learning model and best neural net model with bold text. All classical models were trained on the full dataset (15,000 datapoints) using stratified sampling, while all Neural Networks were completed on a curated sample of the dataset (7,000 datapoints) to deal with class imbalance issues.

This Folder represents the most up-to-date version of the project. For seeing the project as it looked a week after the Final Presentation, please see the Sports Injury Classification Repository.

Jhagrut/Sports-Bot

Sports-Bot: Twitter Injury Report Detection