/arabizi-sentiment-analysis

Tunisian Arabizi dialect sentiment analysis

Primary LanguageJupyter Notebook

AI4D iCompass Social Media Sentiment Analysis for Tunisian Arabizi

Brief Description

TUNIZI is the first 100% Tunisian Arabizi sentiment analysis dataset, developed as part of AI4D’s ongoing NLP project for African languages. Tunisian Arabizi is the representation of the Tunisian dialect written in Latin characters and numbers rather than Arabic letters.
The objective of this challenge is to build a sentiment analysis classifier for the Tunisian Arabizi Dialect.
For more information about this challenge, have a look on Zindi.

Repo Structure

|---- nlp (package)
|       |--- . . .
|       |--- {module}.py
|       |--- . . .
|
|---- data (placeholder for raw and preprocessed data)
|       |--- Train.csv
|       |--- Test.csv
|       |--- SampleSubmission.csv
|       |--- . . .
|
|---- notebooks
|       |--- AI4D_Processing.ipynb
|       |--- AI4D_rzA27Luehf.ipynb
|       |--- AI4D_AH7LwUXCvT.ipynb
|       |--- AI4D_10WwJdQcXs.ipynb
|
|---- submissions (auto-generated)
|       |--- *.csv
|       |--- *.csv
|
|---- setup.py |
|---- Readme.md

PS: This isn't the definitive structure. During the code execution, new directories will be created.

How to run the code

Steps

# 1. Make sure to follow the repo structure
# 2. Run 'pip install ./'
# 3. Run 'notebooks/AI4D_Processing.ipynb'
# 4. Run 'notebooks/AI4D_rzA27Luehf.ipynb', 'notebooks/AI4D_AH7LwUXCvT.ipynb', 'notebooks/AI4D_10WwJdQcXs.ipynb'
# 5. Run 'python blend.py'

Expectations

To make sure that everything is working smoothly, here is what to expect from above (steps):

# 1. 
# 2. This step installs the nlp package
# 3. After this step, verify that 'data/{TrainNormalized.csv, TestNormalized}.csv' exist
# 4. Directory 'submissions/' will be added to the repo structure and contain '{multi-dialect-bert-base-arabic*, bert-multilingual-cased*, roberta-base*}.csv'.
# 5. Performs a simple weight-blend, then creates 'submissions/final_submission.csv' which is the final submission file.

Look for : Muhamed_Tuo
Rank : 9th/312
Accuracy Score: 0.8362(Private) - 0.8394(Public)

Authors

Name Zindi ID Github ID
Muhamed TUO @Muhamed_Tuo @NazarioR9