IndoNLP
We are researchers who push up the lower bound of the Indonesian NLP standard. We are collaborating to release new data resources and benchmarks.
Pinned Repositories
.github
Landing page
cendol
Indonesian T0 | Instruction-tuning for low-resource and extremely low-resource Austronesian languages
indonlg
The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, pre-trained IndoGPT and IndoBART models, and a starter code! (EMNLP 2021)
indonlp.github.io
indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
nusa-catalogue
Dataset Catalogue Homepage for Indonesian Languages
nusa-crowd
A collaborative project to collect datasets in Indonesian languages.
nusa-writes
NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.
nusacrowd-asr
NusaCrowd ASR Experiment
nusax
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
IndoNLP's Repositories
IndoNLP/indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
IndoNLP/nusa-crowd
A collaborative project to collect datasets in Indonesian languages.
IndoNLP/nusax
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
IndoNLP/indonlg
The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, pre-trained IndoGPT and IndoBART models, and a starter code! (EMNLP 2021)
IndoNLP/nusa-writes
NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.
IndoNLP/cendol
Indonesian T0 | Instruction-tuning for low-resource and extremely low-resource Austronesian languages
IndoNLP/nusa-catalogue
Dataset Catalogue Homepage for Indonesian Languages
IndoNLP/nusacrowd-asr
NusaCrowd ASR Experiment
IndoNLP/.github
Landing page
IndoNLP/indonlp.github.io