Natural Language Processing with ASametYildirim

What will I learn here?

  • Natural language processing basics
  • Text and word classification and clustering
  • Deep learning in natural language processing

Natural language processing basics

What is natural language processing?

It is more developed with natural language processing, linguistics and artificial intelligence and takes place in almost every area of our daily life. Basically, it aims to provide communication between human language and computer.

Natural language processing is the process of evolving raw texts into a format that computers can interpret.

Historical development of natural language processing

  • Alan Turing "Computing Machinery and Intelligence?" He publishes his article and is accepted as the foundation of artificial intelligence.
  • He is doing the "Goergetown Experiment" in 1954 and aims to translate 60 words from Russian into English.
  • In the 1960s, ELIZA provided communication between humans and machines with the "Rogerian Psychotherapist Simulation".
  • Statistical Machine Translations in the 1980s.
  • 1980 Used for representational learning and deep learning models.
  • 1970 - 19080 Chatbot

Uses of natural language processing

  • Machine Translation
  • Speech Recognition
  • Sentiment Analysis
  • Question Answering

Subheadings of natural language processing

  • Vocabulary -> Search for the roots of a word in a language
  • Syntax -> Explores sentence construction and flexibility in natural languages
  • Semantic Analysis -> Searches sentence meanings with word meaning
  • Ambiguity -> Searches context with word meaning

Major research topics of natural language processing

  • Entity Name Recognition 1- Custom name

2- date, time, location

3- person institution name

  • Summarizing 1- Extractive summarization (Words and sentences in the text)

2- Abstracting summary (Words and sentences that cannot be included in the text)

  • Text Normalization Elimination of typos increases the success rate
  • Text Classification -> Classification of text such as meaning and length

Core Libraries

NLTK

  • Natural Language Toolkit
  • Statistical and symbolic NLP
  • Python
  • Steven Bird & Edward Loper

SPACY

  • Open source code
  • Python and Cpython
  • Matthew Honnibal & Ines Montani
  • Machine learning and neural networks

BASIN

  • Turkish NLP
  • Open Source code
  • Java
  • Ahmet Afsin & Mehmet Dundar AKIN

Tokenize

What is Token?

Tokens are raw sentences, words, punctuation marks, numbers, symbols.

Tokenize a sentence (punctuation) Tokenize words (spaces and punctuation)

Trunking - Headword Extraction - Ineffective Words

Why should we remove stemming and leading words?

For example: It may have taken both a constructive and an inflectional affix, such as flower-lik-ler.

Inferring the meaning of a text is usually done by finding the root. Because the meaning is here.

hulling

It tries to find the root of the word by removing all construction and inflectional suffixes.

Word Types

  • POS: Part-Of-Speech -> Word Type
  • POSTagging -> Marking a Word Type
  • Word Types -> Adjective, Object, Predicate, Conjunction etc.
  • Purpose -> To mark the word type of each word

Rule-Based Vocabulary Marking Models

An unknown type X word

  • Adjective Rule: Adverb + X + Noun
  • -ing Rule: Verb + X (finished with -ing)
  • Capitalization Rule: X (Started with a capital letter)

Word2Vec Algorithms

To learn the word, help is taken from the words next to it. Generally, there are two types.

  • 1-) Skip-Gram: It guesses the words in the middle from the words on the side.
  • 2-) CBOW (Continuous Bag Of Word): Guess the middle word from the words on the side.

if you try the file here, you will get an output like this