/intro2nlp

Openclassrooms intro to NLP course

Primary LanguageJupyter Notebook

Intro to NLP

This is the repository for the Introduction to Natural Language Processing (NLP) course on openclassrooms released in January 2021, updated in January 2022.

We are text producing factories. Day in, day out and in humongous volumes. But thanks to NLP, we can now classify, correct, predict and even translate any kind of text. So come & join this intro to NLP course and learn how to transform text with word embeddings for exploration and classification.

The course is composed of 3 parts and 12 chapters:

In this repository you will find the datasets used in the course and the related notebooks.

Part I: Text Taming Techniques

  1. Create your first wordcloud
  2. Remove Stop Words From a Block of Text
  3. Apply Tokenization Techniques
  4. Create a Unique Word Form With SpaCy
  5. Doing More with SpaCy: POS and NER

Part II - Vectorize Text for Classification Using Bag-of-Words

  1. Apply a Simple Bag-of-Words Approach
  2. Apply the TF-IDF Vectorization Approach
  3. Apply Classifier Models for Sentiment Analysis

Part III - Vectorize Text for Exploration Using Word Embeddings

  1. Discover The Power of Word Embeddings
  2. Compare Embedding Models
  3. Train Your First Embedding Models
  4. Bonus: Extract Information With Regular Expression