/indonesian-NLP-resources

data resource untuk NLP bahasa indonesia

MIT LicenseMIT

indonesian-NLP-resources

data resource untuk NLP bahasa indonesia

Sentences Dataset

  1. leipzig indonesian sentence collectoin news articles, web articles, wikipedia data from 2008-2016
  2. wn-msa.sourceforge.net Wordnet Bahasa
  3. Quran indonesian quran translation (id.muntakhab, id.jalalayn, id.indonesian)
  4. Kompas online collection. This corpus contains Kompas online news articles from 2001-2002. See here for more info and citations.
  5. Tempo online collection. This corpus contains Tempo online news articles from 2000-2002. See here for more info and citations.
  6. corpus-frog-storytelling spoken text story telling
  7. TED-Multilingual-Parallel-Corpus Monolingual_data/Indonesian
  8. Opus Opus NLPL
  9. Sealang Sealang dataset

Word reference (kemdikbud) link

  1. Entri Dasar : 48.748 (44,64 %)
  2. Kata Turunan : 26.312 (24,09 %)
  3. Gabungan Kata : 30.625 (28,04 %)
  4. Peribahasa : 2.040 (1,87 %)
  5. Kiasan : 268 (0,25 %)
  6. Ungkapan : 1.129 (1,03 %)
  7. Varian : 91 (0,08 %)
  8. Entri Total : 109.213 (100,00 %)
  9. Makna Total : 127.775
  10. Contoh Total : 29.495
  11. Kategori Total : 255
  12. Makna Per Entri : 1,170
  13. Contoh Per Makna : 0,231

Words dataset

  1. Word Sastrawi
  2. Word spaCy : id
  3. Word name : random-name
  4. Word Indo name : genderprediction
  5. Word Indo place : Wilayah-Administratif-Indonesia
  6. Word Indo place : Indonesia-Postal-Code
  7. Word Wiktionary : word id
  8. Word sentiment : analisis-sentimen
  9. Word sentiment : ID-OpinionWords
  10. Word sentiment : Analisis-Sentimen-ID
  11. Word Acronims
  12. word : serangkai

Tagged dataset

  1. NER : yohanesgultom/nlp-experiments 1700 sentences
  2. NER : yusufsyaifudin/indonesia-ner 1835 sentences
  3. POS-TAG : famrashel/idn-tagged-corpus
  4. POS-TAG : pebbie/pebahasa ~600 sentence
  5. POS-TAG Parser : UniversalDependencies/UD_Indonesian-GSD ~4477 sentence
  6. Sentimen 1506 sentences
  7. panl10n Pan Localization

Parallel corpus Eng-Ind

  1. parallel-corpora-en-id
  2. Indonesian-English-Bilingual-Corpus
  3. TALPCo
  4. opus
  5. Multi-Wiki

Morph

  1. MALINDO_Morph
  2. morphind
  3. INDRA

Crawler Data

  1. Crawler Indonesian news portal