/Natural-Language-Processing-NLP

Machine Translation, ASR, Sentiment Analysis, Classification solutions

Primary LanguageJupyter NotebookMIT LicenseMIT

Natural-Language-Processing-NLP

Overview

Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken and written -- referred to as natural language.

It is a component of artificial intelligence.

The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them.

The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Definition

Natural language processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language.

NLP draws from many disciplines, including computer science and computational linguistics, in its pursuit to fill the gap between human communication and computer understanding.

History

Natural language processing has its roots in the 1950s.

Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence.

The proposed test includes a task that involves the automated interpretation and generation of natural language.

The historic stages are listed below;

Symbolic NLP (1950s – early 1990s): The premise of symbolic NLP is well-summarized by John Searle's Chinese room experiment,Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts.

Statistical NLP (1990s–2010s): Up to the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing.

Neural NLP (present): In the 2010s, representation learning and deep neural network-style machine learning methods became widespread in natural language processing.

Approaches

There are three main groups of solving NLP tasks.

  1. Rule-based: Regular expressions and context free grammars are textbook examples of rule-based approaches to NLP.

  2. "Traditional" Machine Learning: It includes probabilistic modeling, likelihood maximization, and linear classifiers.

  3. Neural Networks: It is similar to Machine Learning but with a few differences, like a large training corpus.

Algorithms

Natural language processing algorithms aid computers by emulating human language comprehension.

Some NLP algorithms are:

Lemmatization and Stemming : It works nicely with a variety of other morphological variations of a word. These strategies allow you to limit a single word's variability to a single root.

Topic Modelling: Topic Modeling is a type of natural language processing in which we try to find "abstract subjects" that can be used to define a text set.

Keyword Extraction: Keywords Extraction is one of the most important tasks in Natural Language Processing, and it is responsible for determining various methods for extracting a significant number of words and phrases from a collection of texts.

Knowledge Graphs: Knowledge graphs are a collection of three items: a subject, a predicate, and an entity that explain a method of storing information using triples.

Sentiment Analysis: Sentiment analysis is the most often used NLP technique. Emotion analysis is especially useful in circumstances where consumers offer their ideas and suggestions, such as consumer polls, ratings, and debates on social media.

Tokenization: It's the process of breaking down the text into sentences and phrases.

Applications

Speech Recognition: Speech Recognition is a technology that enables the computer to convert voice input data to machine readable format.

Voice Assistants and Chatbots: All of us are well versed with the idea of Voice assistants like Alexa, Siri and Google Assistant, and chatbots that are integrated in many websites to help and guide new users.

Auto Correct and Auto prediction: There are many softwares available nowadays that check grammar and spelling of the text we type and save us from embarrassing spelling and grammatical mistakes in our emails, texts or other documents.

Email Filtering: Gmail classifies all the emails into primary, social and promotional sections. Even all the spam emails are sent to a different section so that they do not flood our inbox.

Translation: Social Media has brought the entire world together but with unity comes challenges like language barrier. With different translating softwares that work individually or are integrated within other applications, this hurdle has been easily defeated.

Zindi has hosted some challenges based on Natural Language Processing Solutions.

Audio

  1. AI4D Baamtu Datamation Automatic Speech Recognition

  2. Fowl Escapades

  3. GIZ NLP Agricultural Keyword Spotter

  4. Swahili Audio Classification Hackathonby #ZindiWeekendz

  5. Swahili Audio Classification

Text:

  1. #ZindiWeekendz Learning- To Vaccinate or Not to Vaccinate

  2. AI4D Malawi News Classification Challenge

  3. AI4D Takwimu Lab Machine Translation Challenge

  4. AI4D Yorùbá Machine Translation Challenge

  5. AI4D iCompass Social Media Sentiment Analysis

  6. Sustainable Development Goals (SDGs) Text Classification Challenge

  7. AI4D Swahili News Classification Challenge