avtaylor/lacuna_pos_ner

HTMLNOASSERTION

Lacuna Funded Project: MasakhaNER

Datasets developed by the projects are:

Team & Partners

Peter Nabende (Makerere University) - Principal Investigator
David Ifeoluwa Adelani (Masakhane; Saarland University) - NER Coordinator
Bamba Dione (Masakhane; University of Bergen) - POS Coordinator
Jade Abbott (Masakhane; Retro Rabbit) - Data & Translation Coordinator
Constantine Lignos (Masakhane; Brandeis University) - Quality Control
Daniel D’souza (Masakhane) - Tool Management
Sascha Heyer (IO Annotator) - Tool Development & Support

Language Coordinators

Language	Coordinator
Bambara	Allahsera Auguste Tapo
Chichewa	Amelia Taylor
Ewe	Godson Kalipe
Fon	Bonaventure Dossou
Ghomala	Koagne Victoire Memdjokam
Hausa	Tajuddeen Gwadabe
Igbo	Chris Emezue
Kinyarwanda	Happy Buzaaba
Luganda	Jonathan Mukiibi
Luo	Perez Ogayo
Moore	Fatoumata Kabore
Nigerian-Pidgin	Aremu Anuoluwapo
Setswana	Valencia Wagner
Shona	Blessing Sibanda
Swahili	Catherine Gitau
Twi	Edwin Buabeng-Munkoh
Wolof	Derguene Mbaye
isiXhosa	Andiswa Bukula
Yorùbá	Jesujoba Alabi
isiZulu	Rooweither Mabuya

Adding a corpus to the project

It is better to have a folder for each language (folder_name is iso 693-3 letter code) which will have two files,

corpus with filename (iso 693-3 language code) e.g xho.txt
A readme file describing the number of articles sentences, and tokens in the corpus. If possible, please specify news categories for the articles, since we prefer a balanced dataset across different categories.