NepaliSense-DistillBERT

This notebook is directly inspired from the paper Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
We needed parallel Nepali-English corpus for this training. The data has been obtained from TED2020 corpus.
It contains 4184 sentence pairs and 0.12M words. The output including the model and the evaluation data is stored seperately in the same GitHub repository.

Supriya090/NepaliSense-DistillBERT