mpacula/AutoCorpus
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
C++AGPL-3.0
Stargazers
- AdrianLskAmsterdam
- angrytoast@tableau-mkt
- atrillaBarcelona
- BeNhNp
- ccoreilly@parloa
- CheeseTurtle
- comptersUK
- DanielBerns
- danukerTimișoara, Romania
- dariusopen to contracts, not perm
- Durgesh92I AM +
- endpnt
- erelsglAriel University
- EricTheAIMelbourne
- eternity668
- JDvorakKunai
- jxuCarnegie Mellon University
- kgryte@stdlib-js @quansight @data-apis
- kznmft
- maheshcrTataatsu Idealabs
- mcanthonyDΞFCONCΞPTS
- mpaculaMGH
- navta
- nazeeruddinikramFOSS4Good
- neostoic
- nimblemachineNimbleMachine
- otherland
- pkt
- Planeshifter@SocketDev, @stdlib-js, @isle-project
- pshields
- psukys@Vaultspeed
- RuedigerMoeller
- Sandy4321
- shivam5992Singapore
- torronen@kworkme
- zseder