AndreasScharnetzki/NLP

A Jupyter Notebook containing methods for common tasks related to the field of Natural Language Processing

Jupyter Notebook

Natural Language Processing

This module contains methods to accomplish several tasks related to the field of natural language processing (NLP) such as:

P R E P R O C E S S I N G:

Although there are many high-level API available, performing the text cleaning manually can give some advantages in regards of customization.

Loading the data and selecting relevant parts
Removing punctuation
Removing words shorter than a choosen length
Replace numbers with their word-based equivalent
Removing stopwords
Lemmatization
Tokenization

N L P:

Sentiment Analysis
Part-of-Speech-Tagging (POS-Tagging)
Named-Entity-Recognition (NER)
TF-IDF Scoring
Cosine Similarity
MinHashing
WordEmbedding
Latent Semantic Analysis (LSA)