/NLP

A Jupyter Notebook containing methods for common tasks related to the field of Natural Language Processing

Primary LanguageJupyter Notebook

Natural Language Processing

This module contains methods to accomplish several tasks related to the field of natural language processing (NLP) such as:

P R E P R O C E S S I N G:

Although there are many high-level API available, performing the text cleaning manually can give some advantages in regards of customization.

  • Loading the data and selecting relevant parts
  • Removing punctuation
  • Removing words shorter than a choosen length
  • Replace numbers with their word-based equivalent
  • Removing stopwords
  • Lemmatization
  • Tokenization

N L P:

  • Sentiment Analysis
  • Part-of-Speech-Tagging (POS-Tagging)
  • Named-Entity-Recognition (NER)
  • TF-IDF Scoring
  • Cosine Similarity
  • MinHashing
  • WordEmbedding
  • Latent Semantic Analysis (LSA)