TParcollet/nlp-practical-1-2022

Python

Sentiment analysis with RNN and word2vec from scratch

Overall steps are:

Setup the environment (10min)
Prepare the dataset (10min)
Implement CBOW and Skip-gram from scratch in PyTorch. (30min)
Train the models with wikitext-2 (30min)
Visualise a bit the embeddings for certain words. (20min)
Develop a classification pipeline based on a simple on IMDB (without word2vec yet). (30min)
Replace the input features with word embeddings and compare. (20min)

The process will be the following: each task is assigned a time period. At the end of each time period, we will go through the solution.