Valde-T's Stars
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
stack-auth/stack
Open-source Auth0/Clerk alternative
DS3Lab/WordScape
The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.
DS3Lab/DocParser