Turkish NLP library for Haskell. (pronounce: "goo gook".)
Note that this is a personal pet project, heavily influenced by the mighty zemberek-nlp.
- Syllabification (in Guguk.Syllabification)
- Passes all the tests.
- Phonetics (in Guguk.Phonetics)
- More usable set of functions for the existing data and types is needed.
- Turkish Alphabet (in Guguk.TurkishAlphabet)
- ASCIIfying, deASCIIfying functions etc. needed.
- Phonology (in Guguk.Morphology.Phonology)
- More usable set of functions for Turkish phonology and morphotactical rules.
- Tokenization (Guguk.Tokenization)
- Basic functionality for Sentence boundary detector. (TODO: Handling ":" and "...", and changing from
String
toText
) This can be rewritten using Parsec. - Lexer needed.
- Basic functionality for Sentence boundary detector. (TODO: Handling ":" and "...", and changing from
- POS Tagger (Guguk.Syntax.PosTagger)
I'm very open to any pull requests, issues or other kinds of suggestions. Feedback is especially important since I'm neither a Haskell nor Turkish NLP expert.
- Divan.hs: Ottoman Divan poetry vezin checker
MIT License