- This notebook is directly inspired from the paper Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
- We needed parallel Nepali-English corpus for this training. The data has been obtained from TED2020 corpus.
- It contains 4184 sentence pairs and 0.12M words. The output including the model and the evaluation data is stored seperately in the same GitHub repository.