/datasets

A repo for interesting local datasets

MIT LicenseMIT

datasets

This repo contains locally curated dataset that can be trained using ML algorithms. With the aim of enabling Nigerian developers experience how standard ML algorithms will look like on unorthodox datasets instead of the standard datasets used ML research and applications deployed globally.

Nairaland Featured Links

This dataset was curated using this tool which captured all featured links existing on nairaland (from December 24th, 2005 - September 16th, 2017). Each link is a sentence (the title of a thread in the forum). The links where divided into smaller subset as listed below

  • 5k : contains 5,000 featured links (sentences)
  • 20k : contains 20,000 featured links (sentences)
  • 50k : contains 50,000 featured links (sentences)
  • 148k+ : contains all featured links (more than 148,000)