/pycon2020

Natural Language Processing (NLP) in Python tutorial given for PyCon 2020 remote conference.

Primary LanguageJupyter Notebook

pycon2020

Natural Language Processing (NLP) in Python tutorial given for PyCon 2020 remote conference.

Link to video: https://youtu.be/vyOgWhwUmec

Resources

Here is a list of resources helpful for items covered throughout the video

Good libraries for NLP:

Bag of words

Overview: https://machinelearningmastery.com/gentle-introduction-bag-words-model/
Sklearn Code: https://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction

Word Vectors

Overview: https://medium.com/@jayeshbahire/introduction-to-word-vectors-ea1d4e4b84bf
Spacy info: https://spacy.io/usage/vectors-similarity

Regexes

Python overview: https://docs.python.org/3/howto/regex.html
Regex Cheatsheet: https://cheatography.com/davechild/cheat-sheets/regular-expressions/
Regex tester: https://regex101.com/
Regex golf (to practice): https://alf.nu/RegexGolf

Stemming/Lemmatizing

Overview & NLTK Code: https://www.guru99.com/stemming-lemmatization-python-nltk.html
Spacy: https://spacy.io/api/lemmatizer

Stopwords

Quick overview + code: https://www.geeksforgeeks.org/removing-stop-words-nltk-python/

Parts of speech

TextBlob usage: https://textblob.readthedocs.io/en/dev/api_reference.html
List of tags: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Transformers:

Attention is all you need: https://arxiv.org/pdf/1706.03762.pdf
Good overview of these architectures https://www.youtube.com/watch?v=TQQlZhbC5ps
Illustrated transfomer: http://jalammar.github.io/illustrated-transformer/

Transformer Types:

Bert: https://arxiv.org/pdf/1810.04805.pdf
OpenAI GPT: https://openai.com/blog/better-language-models/