BahasaRojakSentimentAnalysis 😸😑😾

Handling Bahasa Rojak (Malaysian Code Mixing Language) OOV and performing Sentiment Analysis using downstreamed Cross Lingual Model XLM-RoBERTa (XLM-T)

Jupyter Notebooks includes detailing of:

Text Preprocessing
Model Fine Tuning
New Data Inference Pipeline

For further resources regarding the project, please access link below.

Access the project here: https://drive.google.com/drive/folders/12Uir9KE4B1VL6oQWdj2BWvCUZOC0vWa2

Ablation Settings:

Preprocessing Method	Model 1 (V1)	Model 2 (V2)	Model 3 (V3)	Model 4 (V4)
Remove URLs	✔	✔	✔	✔
Convert Lowercase	✔	✔	✔	-
Remove Punctuations	✔	✔	✔	-
Remove Irregular Spaces	✔	✔	✔	✔
Handle OOV	✔	✔	✔	✔
Remove Stopwords	✔	✔	-	-
Chinese Character Segmentation	-	✔	✔	-
Remove Rare Words	-	-	✔	-

Model Results:

	Precision		Recall		F1-Score		Accuracy
	0	1	0	1	0	1
Model V1	0.716	0.830	0.840	0.702	0.773	0.760	0.767
Model V2	0.768	0.771	0.735	0.801	0.751	0.786	0.770
Model V3	0.794	0.703	0.691	0.802	0.739	0.749	0.744
Model V4	0.861	0.833	0.802	0.884	0.831	0.858	0.845

Bernardbyy/BahasaRojakSentimentAnalysis

BahasaRojakSentimentAnalysis 😸😑😾

Ablation Settings:

Model Results:

Web Application to Test out the Sentiment Analysis Model (w/ Twitter Web Scraping):

Scrap tweets related to "britneyspears":

Inference Results: