proycon/python-ucto
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Cython
Stargazers
- aldanorDublin, Ireland
- brianrussoTampa, FL
- claytonbrown
- darkdown
- emanjavacas
- eugeneaiISDCT SB RAS
- fbkarsdorpKNAW Meertens Institute
- geledekSingapore
- gibranfpUniversidad Nacional Autónoma de México
- hoi123123
- huangjunwenGuangZhou, China
- JT5DThe IMC Lab + Gallery
- kenoboss
- michaelmiorRochester Institute of Technology
- mikepb
- muminoffElectronic Arts
- navidalveeDhaka, Bangladesh
- nazeeruddinikramFOSS4Good
- nizq
- nschlemmcoder nostra GmbH
- PanderMusubi@OpenTaal @nuspell
- proyconKNAW Humanities Cluster & CLST, Radboud University
- roscopecoltran
- Sehaba95PhD student @ LIRIS
- SimonSusterUniversity of Melbourne
- souzaonofreWeb Fullstack Developer
- todun
- ufukhurriyetogluArgilla
- zeichenkette