Data Science for Social Impact Research Group @ University of Pretoria
We are the Data Science for Social Impact research group at the Computer Science Department, University of Pretoria.
University of Pretoria, South Africa
Pinned Repositories
awesome-africanlp
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
covid19africa
Africa open COVID-19 data working group
covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
deadlines
:alarm_clock: AI/ML/DS conference/workshop/event deadlines on the African continent
gov-za-multilingual
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
masakhane-web
Masakhane Web is a translation web application for solely African Languages.
PuoBERTa
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
textaugment
TextAugment: Text Augmentation Library
vukuzenzele-nlp
The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.
za-marito
DSFSI South African Terminlogy Lists and Lexicon Project
Data Science for Social Impact Research Group @ University of Pretoria's Repositories
dsfsi/textaugment
TextAugment: Text Augmentation Library
dsfsi/covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
dsfsi/deadlines
:alarm_clock: AI/ML/DS conference/workshop/event deadlines on the African continent
dsfsi/vukuzenzele-nlp
The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.
dsfsi/gov-za-multilingual
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
dsfsi/PuoBERTa
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
dsfsi/Higher_Education_EDA
This is an EDA Git for education researchers and practitioners
dsfsi/za-marito
DSFSI South African Terminlogy Lists and Lexicon Project
dsfsi/dsfsi-datasets
Datasets made available for different small projects
dsfsi/data-commons-data
dsfsi/gov-za-sona-multilingual
dsfsi/izindaba-zesizulu
Categorised isiZulu News. Source data is the isiZulu news from the SABC social media posts.
dsfsi/zabantu-beta
ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages
dsfsi/healthfacilitymap
South African Health Facility map. Created to aid in covid19za responses
dsfsi/simcse
dsfsi/za-fake-news-2020
Dataset of South African Disinformation [Fake News] Website Data collected in 2020
dsfsi/.github
dsfsi/absa-masterclass-hands-on
dsfsi/academic-project-page-template
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
dsfsi/bibtextomd
Convert BibTeX entries to formatted Markdown
dsfsi/cos802
Defense against the dark text arts
dsfsi/datacommonsorg-data
dsfsi/datacommonsorg-schema
dsfsi/dlindaba-2019-uber
UBER Rider Rating Data from the DLIndaba 2019
dsfsi/dsfsi-lid
Language Identification For South African languages
dsfsi/edu-assessment-llm-prompt
Educational Assesement using LLMs
dsfsi/flores-fix-4-africa
dsfsi/thapelo-sindane-msc-public
Public Repository containing msc code
dsfsi/za-lid
This repository contains datasets extracted from Vuk'zenzele prepared to train N-gram models, and traditional ML models (Naive Bases, SVM, and Logistic Regression), and Large pretrained multilingual models for language identification
dsfsi/zasca-sum