large-scale-data-processing
There are 2 repositories under large-scale-data-processing topic.
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications