What will I learn here?
- Natural language processing basics
- Text and word classification and clustering
- Deep learning in natural language processing
It is more developed with natural language processing, linguistics and artificial intelligence and takes place in almost every area of our daily life. Basically, it aims to provide communication between human language and computer.
Natural language processing is the process of evolving raw texts into a format that computers can interpret.
- Alan Turing "Computing Machinery and Intelligence?" He publishes his article and is accepted as the foundation of artificial intelligence.
- He is doing the "Goergetown Experiment" in 1954 and aims to translate 60 words from Russian into English.
- In the 1960s, ELIZA provided communication between humans and machines with the "Rogerian Psychotherapist Simulation".
- Statistical Machine Translations in the 1980s.
- 1980 Used for representational learning and deep learning models.
- 1970 - 19080 Chatbot
- Machine Translation
- Speech Recognition
- Sentiment Analysis
- Question Answering
- Vocabulary -> Search for the roots of a word in a language
- Syntax -> Explores sentence construction and flexibility in natural languages
- Semantic Analysis -> Searches sentence meanings with word meaning
- Ambiguity -> Searches context with word meaning
- Entity Name Recognition 1- Custom name
2- date, time, location
3- person institution name
- Summarizing 1- Extractive summarization (Words and sentences in the text)
2- Abstracting summary (Words and sentences that cannot be included in the text)
- Text Normalization Elimination of typos increases the success rate
- Text Classification -> Classification of text such as meaning and length
- Natural Language Toolkit
- Statistical and symbolic NLP
- Python
- Steven Bird & Edward Loper
- Open source code
- Python and Cpython
- Matthew Honnibal & Ines Montani
- Machine learning and neural networks
- Turkish NLP
- Open Source code
- Java
- Ahmet Afsin & Mehmet Dundar AKIN
Tokens are raw sentences, words, punctuation marks, numbers, symbols.
Tokenize a sentence (punctuation) Tokenize words (spaces and punctuation)
For example: It may have taken both a constructive and an inflectional affix, such as flower-lik-ler.
Inferring the meaning of a text is usually done by finding the root. Because the meaning is here.
It tries to find the root of the word by removing all construction and inflectional suffixes.
- POS: Part-Of-Speech -> Word Type
- POSTagging -> Marking a Word Type
- Word Types -> Adjective, Object, Predicate, Conjunction etc.
- Purpose -> To mark the word type of each word
An unknown type X word
- Adjective Rule: Adverb + X + Noun
- -ing Rule: Verb + X (finished with -ing)
- Capitalization Rule: X (Started with a capital letter)
To learn the word, help is taken from the words next to it. Generally, there are two types.
- 1-) Skip-Gram: It guesses the words in the middle from the words on the side.
- 2-) CBOW (Continuous Bag Of Word): Guess the middle word from the words on the side.
if you try the file here, you will get an output like this