dsfsi-datasets

There are 19 repositories under dsfsi-datasets topic.

  • dsfsi/covid19za

    Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa

    Language:Jupyter Notebook25531186199
  • dsfsi/vukuzenzele-nlp

    The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.

    Language:Jupyter Notebook70226
  • dsfsi/PuoBERTa

    A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.

    Language:Makefile5100
  • dsfsi/gov-za-multilingual

    The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements

    Language:Jupyter Notebook4380
  • dsfsi/Higher_Education_EDA

    This is an EDA Git for education researchers and practitioners

    Language:Jupyter Notebook3211
  • dsfsi/za-mavito

    DSFSI South African Terminlogy Lists and Lexicon Project

    Language:HTML3110
  • dsfsi/project-state-capture

    Zondo Commission or State Capture Commission Transcripts

  • dsfsi/PuoData

    Curated corpora for Setswana. Used to train PuoBERTa.

  • dsfsi/sa-parliament

    South African Member Of Parliament Data

    Language:Python2205
  • dsfsi/za-bank-risk

    This repository is an initial pipeline for reading, processing, labelling and classifying unstructured annual reports of South African (SA) banks with the aim of identifying financial risk. It leveraged work by the Corporate Financial Information Environment-Final Report Structure Extractor (CFIE–FRSE) of El-Haj et al. which created a corpus of annual reports of United Kingdom (UK) companies.

    Language:Jupyter Notebook2100
  • dsfsi/edu-assessment-llm-prompt

    Educational Assesement using LLMs

    Language:Python1101
  • dsfsi/embedding-eval-data

    Embedding Evaluation Data for South African Languages

  • dsfsi/izindaba-zesizulu

    Categorised isiZulu News. Source data is the isiZulu news from the SABC social media posts.

  • healthfacilitymap

    dsfsi/healthfacilitymap

    South African Health Facility map. Created to aid in covid19za responses

    Language:JavaScript0412
  • dsfsi/StatsSA-Language

    StatsSA statistical language glossary in machine-readable format

    Language:Jupyter Notebook0302
  • dsfsi/za-fake-news-2020

    Dataset of South African Disinformation [Fake News] Website Data collected in 2020

  • dsfsi/dlindaba-2019-uber

    UBER Rider Rating Data from the DLIndaba 2019

  • dsfsi/za-isizulu-siswati-news-2022

    IsiZulu News (articles and headlines) and Siswati News (headlines) Corpora - za-isizulu-siswati-news-2022