The project for CS332 in SJTU. In progress:
- Read corpus from file
- Word segmentation
- Generate word vector
- POS
- Name Entity Recognization
NOTE: Some word not in model will be mark nas None in word vectors
Platform:
- Word2Vec, gensim, model: https://github.com/to-shimo/chinese-word2vec
- Pyltp: https://github.com/HIT-SCIR/pyltp
Usage: Install the libs, download models to this repo.