lxy9843@sjtu.edu.cn 2018-8-13
-
Implement training and testing algorithms for GMM. Programmes must be written in C/C++ or python or Matlab.
-
Use train.txt for training and check the result on dev.txt. The complexity of GMM and initialisation of GMM will be decided by you.
-
Once the final GMM configuration is fixed, you will perform classification on test.txt and save the result in the same format as dev.txt.
-
Final submission should include:
- a. Detailed report including:
- i. Initialisation of GMM
- ii. GMM parameter tuning process (likelihood change, result on dev.txt etc.)
- iii. Analysis and discussion
- b. Classification result: test.txt with label
- c. Source code or tools which can be compiled and/or run under windows or linux machine (Ubuntu)
- a. Detailed report including:
-
K: number of GM
- hyper parameter
-
$\mu_i$ : expected valuenp.random.random((K, D)) * np.mean(x, axis=0)
- D: dim
- x: input data
-
$\Sigma_i$ : covariance[np.mat(np.eye(D)) for _ in range(K)]
- D: dim
-
$\pi_i$ : mixing probabilitytemp = np.random.random(K)
pi = temp / np.sum(temp)
- First, illustrate the dataset
-
It's clear that both red and blue clusters are generated by 4 GM (K=4)
-
Run tests on dev.txt to varify that. (acc > 95%, positive)
- Why acc != 100% in test data
- Because some points are too far from it's origin GM's center, and too near to another's.
- results
- classification results
- data
- kaggle competition data
- data.py
- script to handle csv format
- GMM.py
- GMM model, trained when initialized
- train.py
- script using model to finish kaggle task
- run.sh
- python train.py [arg="train"/"test"]
- submit.sh
- submit result($1) to kaggle, comment($2)