/nlp-warmup

Fundamentals of natural language processing using python toolkit

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

NLP

This repository contains python natural language tool kit (nltk) basics.

Prerequisites

This tutorial can help you understand basics of natural language processing (NLP).

Table of contents

Chapter Description
1 NLP Basics NLTK overview. Using regex for text data. Using Machine Learning pipelines for preprocessing.
2 Supplemental Data Cleaning Introducing and comparing stemming and lemmatization.
3 Vectorizing Raw Data Introducing and comparing vectorization methods like count vectorization, N-gram and inverse document frequency weighting.
4 Feature Engineering Feature creation. Feature evaluation. Identifying candidate features for transformation. Box-Cox power transformation.
5 Building Machine Learning Classifiers Cross validation. Evaluation metrics. Hold out method. Random Forest classifier. Grid search for hyperparameter adjustment. Gradient Boosting method.

Note

You will probably need to install nltk on a virtual environment and then use that virtual environment in a Jupyter Notebook. If so, you can find commands needed in instructions.txt.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details