Sungai (pronounced soon-nai) means river in Malay and is a sample multilingual dataset. It is meant to be used for NLP multilingual model distillation. "mdd" stands for "multilingual distillation dataset".
huu4ontocord/sungai
Sample multilingual data and tools for creating the data - used for NLP multilingual NLP research
Apache-2.0