In this project, we will use data from the Kaggle. It's some Arabic datasets are on the market for classification comparison and different NLP tasks. This dataset is principally a compilation of many available datasets and a sampling of 100k rows.it is a talk about reviews and this review a three type Negative, Positive and Mixed.
Create a model to Predict the type of text it's Negative Positive or Mixed.
The dataset combines reviews of hotels, books, movies, products, and some airlines. It has three classes (Mixed, Negative and Positive). Most were mapped from rater scores with a mix of 3, more than 3 positives, and less than 3 negatives. Each line has a label and text separated by tabs (tsv). The (reviews) text has been cleaned up by removing Arabic diacritics and non-Arabic characters. The dataset does not have duplicate revisions.
Field Name | Description |
---|---|
Label | User 'sentiment': Mixed, Negative, Positive |
Text | Review text |
- Number of rows = 100000 rows
- Number of columns = 2 columns
- Python
- Jupyter Notebook
- PowerPoint for presentation
- web
- ArabicLightStemmer
- libqutrub.conjugator
- naftawayh.wordtag
- tashaphyne.stemming
- plotly.graph_objs
- TruncatedSVD
- TfidfVectorizer
- CountVectorizer
- NMF
- strip_tatweel
- strip_shadda
- FarasaPOSTagger
- FarasaNamedEntityRecognizer
- FarasaDiacritizer
- FarasaSegmenter
- FarasaStemmer
- qalsadi.lemmatizer
- pandas
- numpy
- sklearn.linear_model
- sklearn.model_selection
- sklearn.preprocessing
- sklearn.metrics
- matplotlib.pyplot
- seaborn
- string
- nltk
- warnings