This page describes Vietnamese NLP tools, resources and some computation techniques related on Vietnamese. If there is something missing or wrong information, please create an issue or pull request.
(Sorry, I don't speak Vietnamese.)
This tool is based on a point-wise diacritics restoration.
DongDu is a fastest and high accuracy word segmentation tool based on SVMs.
vnTokenizer provides Java interface. I found some unofficial github repositories.
This tool provides Java interface.
This tool provides Python interface.
This tool provides Python interface.
This toolkit is for large data processing, includes word segmentation, part-of-sppech tagging, dependency parsing.
Vietnamese treebank
650 thousand sentences.
300 thousand sentences.
1.75 million sentences.
NFC is a one of the unicode normalization form. It is suitable for Vietnamese text preprocessing.
- Tuan Anh Luu and Kazuhide Yamamoto. A Point-wise Approach for Vietnamese Diacritics Restoration. Proceedings of the International Conference on Asian Language Processing (IALP 2012), pp.189-192 (2012.11)