NLP

This repository contains python natural language tool kit (nltk) basics.

Prerequisites

This tutorial can help you understand basics of natural language processing (NLP).

	Chapter	Description
1	NLP Basics	NLTK overview. Using regex for text data. Using Machine Learning pipelines for preprocessing.
2	Supplemental Data Cleaning	Introducing and comparing stemming and lemmatization.
3	Vectorizing Raw Data	Introducing and comparing vectorization methods like count vectorization, N-gram and inverse document frequency weighting.
4	Feature Engineering	Feature creation. Feature evaluation. Identifying candidate features for transformation. Box-Cox power transformation.
5	Building Machine Learning Classifiers	Cross validation. Evaluation metrics. Hold out method. Random Forest classifier. Grid search for hyperparameter adjustment. Gradient Boosting method.

Note

You will probably need to install nltk on a virtual environment and then use that virtual environment in a Jupyter Notebook. If so, you can find commands needed in instructions.txt.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

mmheydari97/nlp-warmup

NLP

Prerequisites

Table of contents

Note

License