semantic-deduplication
There are 2 repositories under semantic-deduplication topic.
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
MinishLab/semhash
Fast Semantic Text Deduplication & Filtering
There are 2 repositories under semantic-deduplication topic.
Scalable data pre processing and curation toolkit for LLMs
Fast Semantic Text Deduplication & Filtering