EECE 571T Course Project: Word embedding for Cantonese Language based on Word2Vec

This repository contains the source code for the research works done for the course project. Scripts related to text pre-processing are under the root. Scripts related to testing are under the test folder. Other related materials are placed under the data folder and merged folder. The source codes are all written in Python.

Team Member

Jeffery Li - 59322511 Jay Fu - 54675301

Package Dependency

The source code mainly relies on the following Python packages:

  1. jieba
  2. gensim
  3. cython
  4. matplotlib
  5. scikit-learn

The package requirement is not limited by these listed packages.