NLP (Natural Language Processing)
Data set : https://www.kaggle.com/datasets/kazanova/sentiment140?select=training.1600000.processed.noemoticon.csv
Data Set ada 6 kolum
target: the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
date: the date of the tweet (Sat May 16 23:58:44 UTC 2009)
flag: The query (lyx). If there is no query, then this value is NO_QUERY.
user: the user that tweeted (robotickilldozr)
text: the text of the tweet (Lyx is cool)
Rename data set jadi : twitter16m.csv
Library :
SpaCy
General Feature Extraction
File loading
Word counts
Characters count
Average cahr per word
Stop words count
Caount #hashtags and @mentions
If numeric digits are present in tweets
Upper case word counts
Lower Case
Contraction to Expansion
Email removal and counts
Removal of RT
Removal special characaters
Removal of multiple spaces
Removal Html TAGS
Removal of accented chars
Removal of stop Word
Conversion into base from of words
Common Occuring words Removal
Rare Ocuring Words Removal
Word Cloud
Spelling Correction
Tokenization
Lemmatization
Detecting Entities Using NER
Noun Detection
Language Detection
Sentence Translation
Using Inbuilt Sentiment Clasifier
Advanced Text Processing and Feature Extraction
N-Gram, Bi-Gram etc
Bag of Words (BoW)
Term Frequency Calculation TF
Inverse Document Frequency
TFIDF Term Frequency Inverse Document
Word Embedding Word2Vec using SpaCy
Machine Learning Models for Text Clasification
SGDClassifier
LogisticRegression
LogisticRegressionCV
LinerarSVC
RandomForestClasifier