/nlp817

Natural Language Processing 817

Course material

# Note Videos
1 Introduction to natural language processing What is natural language processing? (15 min)
2 Text normalisation, units and edit distance A first NLP example (8 min)
Text normalisation and tokenisation (8 min)
Words (12 min)
Morphology (5 min)
Stems and lemmas (3 min)
Byte-pair encoding (BPE) (9 min)
Edit distance (20 min)
3 Language modelling with N-grams The language modelling problem (10 min)
N-gram language models (13 min)
Start and end of sentence tokens in language models (11 min)
Why use log in language models? (4 min)
Evaluating language models using perplexity (9 min)
Language model smoothing intuition (6 min)
Additive smoothing in language models (10 min)
Absolute discounting in language models (5 min)
Language model interpolation (11 min)
Language model backoff (4 min)
Kneser-Ney smoothing (8 min)
Are N-gram language models still used today? (2 min)
4 Entropy and perplexity (advanced) What are perplexity and entropy? (14 min)
5 Hidden Markov models A first hidden Markov model example (14 min)
Hidden Markov model definition (9 min)
The three HMM problems (3 min)
The Viterbi algorithm for HMMs (24 min)
Viterbi HMM example (19 min)
Why do we want the marginal probability in an HMM? (7 min)
The forward algorithm for HMMs (19 min)
Learning in HMMs (8 min)
Hard expectation maximisation for HMMs (12 min)
Soft expectation maximisation for HMMs (20 min)
Why expectation maximisation works (12 min)
The log-sum-exp trick (9 min)
Hidden Markov models in practice (4 min)
6 Expectation maximisation (advanced)
7 Word embeddings Why word embeddings? (9 min)
One-hot word embeddings (6 min)
Skip-gram introduction (7 min)
Skip-gram loss function (8 min)
Skip-gram model structure (8 min)
Skip-gram optimisation (10 min)
Skip-gram as a neural network (10 min)
Skip-gram example (2 min)
Continuous bag-of-words (CBOW) (6 min)
Skip-gram with negative sampling (16 min)
GloVe word embeddings (12 min)
Evaluating word embeddings (21 min)
8 Introduction to neural networks Playlist
Video list
9 Recurrent neural networks From feedforward to recurrent neural networks (15 min)
RNN language model loss function (9 min)
RNN definition and computational graph (3 min)
Backpropagation through time (25 min)
Vanishing and exploding gradients in RNNs (13 min)
Solutions to exploding and vanishing gradients (in RNNs) (10 min)
Extensions of RNNs (8 min)
10 Encoder-decoder models and attention A basic encoder-decoder model for machine translation (13 min)
Training and loss for encoder-decoder models (10 min)
Encoder-decoder models in general (18 min)
Greedy decoding (5 min)
Beam search (18 min)
Basic attention (22 min)
Attention - More general (13 min)
Evaluating machine translation with BLEU (23 min)
11 Self-attention and transformers Intuition behind self-attention (12 min)
Attention recap (6 min)
Self-attention details (13 min)
Self-attention in matrix form (5 min)
Positional encodings in transformers (19 min)
The clock analogy for positional encodings (5 min)
Multi-head attention (5 min)
Masking the future in self-attention (5 min)
Cross-attention (7 min)
Transformer (4 min)
12 Large language models Intro to large language models by Andrej Karpathy (1 h)
Large language model training and inference (14 min)
The difference between GPT and ChatGPT (13 min)
Reinforcement learning from human feedback (15 min)

Acknowledgements

With permission, I have used content from the NLP courses taught by Jan Buys (University of Cape Town) and Sharon Goldwater (University of Edinburgh).

License

Herman Kamper, 2022-2024
This work is released under a Creative Commons Attribution-ShareAlike license (CC BY-SA 4.0).