This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi. I did this as part of the Masakhane NER project
NLP, NER , Masakhane
The sentences were obtained from Ramogi FM website: https://rmsradio.co.ke/brands/ramogi-fm/
Dates published: 1/9/2018 - 10/3/2021. Get the most updated information from README.txt
Get the most updated information from README.txt
This repo contains 3 main files of interest.
This file
Contains statistical description of the data- News domains, publication and collection dates
Contains a cleaned compilation the text
The rest are just files used in the collection and cleaning process.
- Clone this repo to your local machine using
https://github.com/Pogayo/Luo-News-Dataset
To get started...
-
Option 1
- 🍴 Fork this repo!
-
Option 2
- 👯 Clone this repo to your local machine using
https://github.com/Pogayo/Luo-News-Dataset
- 👯 Clone this repo to your local machine using
- HACK AWAY! 🔨🔨🔨
- 🔃 Create a new pull request
- We are a small team. Join us and let's put Luo on the NLP Map together!
- How do I do collect the sentences?
- Go to the Ramogi Website . Typically, you will only find the latest news.
- If you have exhausted the latest news, go to the web archive to get links of earlier news.
I am in the process of setting up a wallet. Feel free to reach out to me so that I can give you other payment details in the meantime.
This work is licensed under a Creative Commons Attribution 4.0 International License.