Chinese NLP with Open Source Tools in python
This repository contains the materials used in the presentation at the PyCon HK 2015 (http://2015.pycon.hk/). The presentation introduces open source tools in Python that can be used to perform general natural language processing in Chinese.
Data
The examples in the presentation involve using the article abstracts from the Traditional Chinese Wikipedia. The XML dump of Wikipedia can be downloaded here: http://dumps.wikimedia.org/zhwiki/latest/.
Tools Introduced
References
- Latent Semantic Analysis: http://nlp.stanford.edu/IR-book/pdf/18lsi.pdf (Introduction to Information Retrieval, Chapter 18)
- Word2Vec: https://code.google.com/p/word2vec/