data-centric-ai
There are 62 repositories under data-centric-ai topic.
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
voxel51/fiftyone
The open-source tool for building high-quality datasets and computer vision models
Docta-ai/docta
A Doctor for your data
code-kern-ai/refinery
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
Renumics/spotlight
Interactively explore unstructured datasets from your dataframe.
HazyResearch/data-centric-ai
Resources for Data Centric AI
daochenzha/data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
cleanlab/cleanvision
Automatically find issues in image datasets and practice data-centric computer vision.
Renumics/awesome-open-data-centric-ai
Curated list of open source tooling for data-centric AI on unstructured data.
dcai-course/dcai-lab
Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 π©π½βπ»
JieyuZ2/wrench
[NeurIPS 2021] WRENCH: Weak supeRvision bENCHmark
yueyu1030/AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
aai-institute/pyDVL
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
dcai-course/dcai-course
Introduction to Data-Centric AI, MIT IAP 2023 π€
opendataval/opendataval
OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)
OFA-Sys/DiverseEvol
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
NextBrain-ai/nbsynthetic
nbsynthetic is simple and robust tabular synthetic data generation library for small and medium size datasets
TonyLianLong/UnsupervisedSelectiveLabeling
[ECCV 2022] Official Implementation for Unsupervised Selective Labeling for More Effective Semi-Supervised Learning
astutic/Acharya
A Data Centric NER annotation tool for your Named Entity Recognition projects
KibromBerihu/ai4elife
This data-centric AI repository implements a robust deep learning method (LFBNet) for fully automated tumor segmentation in whole-body [18]F-FDG PET/CT images.
nachifur/LLPC
Frontiers in Neuroinformatics 2022: Local Label Point Correction for Edge Detection of Overlapping Cervical Cells
cleanlab/cleanlab-studio
Client interface for all things Cleanlab Studio
awesome-mlops/awesome-data-management
A curated list of awesome open source tools and commercial products to catalog, version, and manage data π
ear-team/bambird
Unsupervised classification to improve the quality of a bird song recording dataset. https://doi.org/10.1016/j.ecoinf.2022.101952
sail-sg/D-TRAK
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
kennethleungty/Data-Centric-AI-Competition
Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI
Digital-Dermatology/SelfClean
π§Όπ A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.
Lichang-Chen/AlpaGasus
A better Alpaca Model Trained with Less Data (only 9k instructions of the original set)
autonlab/aqua
AQuA: A Benchmarking Tool for Label Quality Assessment
code-kern-ai/refinery-python-sdk
Official Python SDK for Kern AI refinery.
SJTU-DMTai/awesome-ml-data-quality-papers
Papers about training data quality management for ML models.
koalazf99/Awesome-DataCentric-LLM
trending projects & awesome papers about data-centric llm studies.
seedatnabeel/Data-IQ
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
xszheng2020/memorization
An Empirical Study of Memorization in NLP (ACL 2022)
Living-with-machines/genre-classification
Jupyter book showing how to build an ML powered book genre classifier
Nokia-Bell-Labs/data-centric-federated-learning
Enhancing Efficiency in Multidevice Federated Learning through Data Selection