/nlp_training

Primary LanguageJupyter Notebook

Natural Language Processing

Spacy -> Open source natural language processing library for python. Has one algorithm which can be used, which is the most efficient one.

NLTK -> Natural Language Toolkit is a very popular open source.

For most common NLP tasks, Spacy is much faster. NLTK has a variety of implementations.

Download english language library with spacy. Spacy needs to be installed.

$ python -m spacy download en

NLP Basics

NLP -> text processing (natural language processing) Text data is unstructured. Example Use Cases: -> Classifying Emails as Spam vs Legitimate -> Sentiment Analysis of Text Movie -> Analyzing Trends from written customer feedback forms -> Understating text commands

Sometime before doing the nlp on text, a stemming is done to reduce the number of words. Another technique is Lemmatization. Lemmatization is looking at surrounding text to determine the text to reduce.

Stop Words => common words which doesn't give you extra information: "a", "the", "what" Resources

https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment