This repository contains State of the Art Language models and Classifier for Telugu language(spoken in Indian sub-continent)
The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)
Architecture/Dataset | Telugu Wikipedia Articles |
---|---|
ULMFiT | 27.47 |
TransformerXL | 29.44 |
Dataset | Accuracy | Kappa Score |
---|---|---|
Telugu News Articles | 95.4 | 93.8 |
Telugu News Articles - Andhra Jyoti | 92.09 |
Architecture | Visualization |
---|---|
ULMFiT | Embeddings projection |
TransformerXL | Embeddings projection |
Download pretrained Language Model from here
Download classifier from here
Trained tokenizer using Google's sentencepiece
Download the trained model and vocabulary from here