huggingface/cosmopedia

Integration with datatrove

Closed this issue · 0 comments

I really like fineweb-edu and datatrove. I found the BERT inference code just uses the datasets library. I’m curious how should we choose between datasets and datatrove? I like both libraries and am doing some similar work, but am having a hard time to choose the toolchain. Thank you!