Overall steps are:
- Setup the environment (10min)
- Prepare the dataset (10min)
- Implement CBOW and Skip-gram from scratch in PyTorch. (30min)
- Train the models with wikitext-2 (30min)
- Visualise a bit the embeddings for certain words. (20min)
- Develop a classification pipeline based on a simple on IMDB (without word2vec yet). (30min)
- Replace the input features with word embeddings and compare. (20min)
The process will be the following: each task is assigned a time period. At the end of each time period, we will go through the solution.