/VOA-Swahili-Dataset

The repository contains Swahili data from VOA.

VOA Swahili Data Readme

This repository contains text data from the Swahili section of Voice of America (VOA), voaswahili.com. This dataset includes articles retrieved from sitemaps between June 16, 2021 to December 1, 2021. This is comprised of articles published starting in 2001 up until December 1, 2021.

All data is in the public domain.

public domain mark

English articles and paragraphs were filtered out using cld3.