This repository contains python natural language tool kit (nltk) basics.
This tutorial can help you understand basics of natural language processing (NLP).
Chapter | Description | |
---|---|---|
1 | NLP Basics | NLTK overview. Using regex for text data. Using Machine Learning pipelines for preprocessing. |
2 | Supplemental Data Cleaning | Introducing and comparing stemming and lemmatization. |
3 | Vectorizing Raw Data | Introducing and comparing vectorization methods like count vectorization, N-gram and inverse document frequency weighting. |
4 | Feature Engineering | Feature creation. Feature evaluation. Identifying candidate features for transformation. Box-Cox power transformation. |
5 | Building Machine Learning Classifiers | Cross validation. Evaluation metrics. Hold out method. Random Forest classifier. Grid search for hyperparameter adjustment. Gradient Boosting method. |
You will probably need to install nltk on a virtual environment and then use that virtual environment in a Jupyter Notebook. If so, you can find commands needed in instructions.txt.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details