/RusTokenizer

Splitter and tokenizer for Russian

Primary LanguagePython

RusTokenizer

This is a splitter and tokenizer for the Russian language, written in Python 2.7.

A Python 3.x version of this tokenizer is available in the Readability project, and the description of the updated version can be found here (thanks to sadov-m!).