Lessons
course 01 : Mini NLP Tasks
Cleaning
1- Select Only Words, sadece kelimeleri secmek.
2- Remove Punctuation, noktalama isaretlerini silmek.
3- Normalizing Case, buyuk kucuk harf normalizasyonunu yapmak.
4- Remove StopWords, stopwords leri kaldirmak.
5- Remove One-Length Characters, tek karakterli kelimeleri silmek.
6- Remove Digits, sayilari silmek.
- Stemming, (kelimenin kokune gitme islemi)
Bag of Words - Kelime Cantasi
Bag of Words (BoW) is a model used in natural language processing.
One aim of BoW is to categorize documents.
The idea is to analyse and classify different "bags of words" (corpus).
And by matching the different categories, we identify which "bag" a certain block of text (test data) comes from.
https://ongspxm.github.io/blog/2014/12/bag-of-words-natural-language-processing/
Building a "Bag of Words" involves 3 steps:
Kelime Cantasi olusturma 3 asamalidir:
--tokenizing
--counting
--normalizing
Kelime Cantasi- (Bag of Words) (BoW), dogal dil islemesinde kullanilan bir modeldir.
BoW'un bir amaci, belgeleri kategorize etmektir. Amac farkli "canta sozcukleri" 'ni (korpus)
analiz etmek ve siniflandirmaktir. Farkli kategorilere eslestirerek,
belirli bir metin blogunun (test verisi) "cantasini" tanimlamis oluyoruz.
Natural Language Processing with Python --> http://www.nltk.org/book/
ch 01 : Language Processing and Python
ch 02 : Accessing Text Corpora and Lexical Resources
ch 03 : Processing Raw Text
ch 04 : Writing Structured Programs
ch 05 : Categorizing and Tagging Words