
More details about the tokenizer?

Opened this issue · 0 comments

Thanks for sharing your project and the data. I think the preprocessing steps are also making important differences in terms of translation quality, but only a jar file is available. Would you mind sharing a little more about the major features of the tokenizer, or even better, sharing the source code?