This is the curriculum for "Learn Natural Language Processing" by Siraj Raval on Youtube
This is the Curriculum for this video on Learn Natural Language Processing by Siraj Raval on Youtube. After completing this course, start your own startup, do consulting work, or find a full-time job related to NLP. Remember to believe in your ability to learn. You can learn NLP , you will learn NLP, and if you stick to it, eventually you will master it.
Join the #NLP_curriculum channel in our Slack channel to find one http://wizards.herokuapp.com
- Video Lectures
- Reading Assignments
- Project(s)
- 8 weeks
- 2-3 Hours of Study per Day
- Python, PyTorch, NLTK
- Learn Python https://www.edx.org/course/introduction-python-data-science-2
- Statistics http://web.mit.edu/~csvoss/Public/usabo/stats_handout.pdf
- Probability https://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf
- Calculus http://tutorial.math.lamar.edu/pdf/Calculus_Cheat_Sheet_All.pdf
- Linear Algebra https://www.souravsengupta.com/cds2016/lectures/Savov_Notes.pdf
- Overview of NLP (Pragmatics, Semantics, Syntax, Morphology)
- Text preprocessing (stemmings, lemmatization, tokenization, stopword removal)
- https://web.stanford.edu/~jurafsky/slp3/ videos 1-2.5
- https://www.youtube.com/watch?v=hyT-BzLyVdU&list=PLDcmCgguL9rxTEz1Rsy6x5NhlBjI8z3Gz
- Ch 1-2 of Speech and Language Processing 3rd ed, slides
- Look at 1-1 to 3-4 to learn NLTK https://github.com/hb20007/hands-on-nltk-tutorial
- Then use NLTK to perform stemming, lemmatiziation, tokenization, stopword removal on a dataset of your choice
- Lexicons
- Pre-deep learning Statistical Language model pre-deep learning ( HMM, Topic Modeling w LDA)
- 4,6,7,8,9,10 from the UWash course
- https://github.com/TreB1eN/HiddenMarkovModel_Pytorch Build Hidden Markov Model for Weather Prediction in PyTorch
- Suggested readings from course
- 3 Assignments Visualize and Implement Word2Vec, Create dependency parser all in PyTorch (they are assigments from the stanford course)
- Sequence to Sequence Models (translation, summarization, question answering)
- Attention based models
- Deep Semantic Similarity
- Read this on Deep Semantic Similarity Models https://kishorepv.github.io/DSSM/
- Ch 10 Deep Learning Book on Sequence Modeling http://www.deeplearningbook.org/contents/rnn.html
- 3 Assignments, create a translator and a summarizer. All seq2seq models. In pytorch.
- Speech Recognition
- Dialog Managers, NLU
- Ch 24 of this book https://web.stanford.edu/~jurafsky/slp3/24.pdf
- Create a dialogue system using Pytorch https://github.com/ywk991112/pytorch-chatbot and a task oriented dialogue system using DialogFlow to order food
- My videos on BERT and GPT-2, how to build biomedical startup. Search "Siraj Raval BERT", "Siraj raval GPT-2" and "How to Build a biomedical startup" to find them on youtube.
- Transfer learning with BERT/GPT-2/ELMO
- http://ruder.io/nlp-imagenet/
- https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html
- http://jalammar.github.io/illustrated-bert/
- Play with this https://github.com/huggingface/pytorch-pretrained-BERT#examples pick 2 models, use it for one of 9 downstream tasks, compare their results.
- Visual Semantics
- Deep Reinforcement Learning
- CMU Video https://www.youtube.com/watch?v=isxzsAelQX0
- Module 5-6 of this https://www.edx.org/course/natural-language-processing-nlp-3
- https://cs.stanford.edu/people/karpathy/cvpr2015.pdf
- Hilarious https://medium.com/@yoav.goldberg/an-adversarial-review-of-adversarial-generation-of-natural-language-409ac3378bd7
- Policy gradient text summarization https://github.com/yaserkl/RLSeq2Seq#policy-gradient-w-self-critic-learning-and-temporal-attention-and-intra-decoder-attention reimplment in pytorch