Data Science for Social Impact Research Group @ University of Pretoria
We are the Data Science for Social Impact research group at the Computer Science Department, University of Pretoria.
University of Pretoria, South Africa
Pinned Repositories
awesome-africanlp
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
covid19africa
Africa open COVID-19 data working group
covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
deadlines
:alarm_clock: AI/ML/DS conference/workshop/event deadlines on the African continent
gov-za-multilingual
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
Higher_Education_EDA
This is an EDA Git for education researchers and practitioners
masakhane-web
Masakhane Web is a translation web application for solely African Languages.
PuoBERTa
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
textaugment
TextAugment: Text Augmentation Library
vukuzenzele-nlp
The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.
Data Science for Social Impact Research Group @ University of Pretoria's Repositories
dsfsi/textaugment
TextAugment: Text Augmentation Library
dsfsi/covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
dsfsi/deadlines
:alarm_clock: AI/ML/DS conference/workshop/event deadlines on the African continent
dsfsi/vukuzenzele-nlp
The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.
dsfsi/gov-za-multilingual
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
dsfsi/PuoBERTa
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
dsfsi/Higher_Education_EDA
This is an EDA Git for education researchers and practitioners
dsfsi/dsfsi-datasets
Datasets made available for different small projects
dsfsi/project-state-capture
Zondo Commission or State Capture Commission Transcripts
dsfsi/PuoData
Curated corpora for Setswana. Used to train PuoBERTa.
dsfsi/sa-parliament
South African Member Of Parliament Data
dsfsi/za-terminology
DSFSI South African Terminlogy Lists and Lexicon Project
dsfsi/embedding-eval-data
Embedding Evaluation Data for South African Languages
dsfsi/gov-za-sona-multilingual
dsfsi/izindaba-zesizulu
Categorised isiZulu News. Source data is the isiZulu news from the SABC social media posts.
dsfsi/za-bank-risk
This repository is an initial pipeline for reading, processing, labelling and classifying unstructured annual reports of South African (SA) banks with the aim of identifying financial risk. It leveraged work by the Corporate Financial Information Environment-Final Report Structure Extractor (CFIE–FRSE) of El-Haj et al. which created a corpus of annual reports of United Kingdom (UK) companies.
dsfsi/healthfacilitymap
South African Health Facility map. Created to aid in covid19za responses
dsfsi/StatsSA-Language
StatsSA statistical language glossary in machine-readable format
dsfsi/za-fake-news-2020
Dataset of South African Disinformation [Fake News] Website Data collected in 2020
dsfsi/.github
dsfsi/academic-project-page-template
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
dsfsi/bibtextomd
Convert BibTeX entries to formatted Markdown
dsfsi/cos802
Defense against the dark text arts
dsfsi/datacommonsorg-data
dsfsi/datacommonsorg-schema
dsfsi/dlindaba-2019-uber
UBER Rider Rating Data from the DLIndaba 2019
dsfsi/dsfsi-lid
Language Identification For South African languages
dsfsi/edu-assessment-llm-prompt
Educational Assesement using LLMs
dsfsi/simcse
dsfsi/za-isizulu-siswati-news-2022
IsiZulu News (articles and headlines) and Siswati News (headlines) Corpora - za-isizulu-siswati-news-2022