/sungai

Sample multilingual data and tools for creating the data - used for NLP multilingual NLP research

Apache License 2.0Apache-2.0

sungai

Sungai (pronounced soon-nai) means river in Malay and is a sample multilingual dataset. It is meant to be used for NLP multilingual model distillation. "mdd" stands for "multilingual distillation dataset".