COMP30027 Machine Learning Assignment 1

Group Members

Read train and test data from data/train.csv and data/test.csv and convert them to dataframe (df_train & df_test)
Run preprocess(df_train, df_test) to preprocess data and convert df_test into numpy array (np_test)
Run train(df_train) to get statistics from train data (prior probability, mean, stdv) and store it into pose_dict
Run predict(np_test, pose_dict) to predict the results and store the results into list (results)
Run evaluate(results) to get the accuracy of the model

Run get_con_matrix(results, poses) to get confusion matrix (con_matrix) for the result obtained in the previous section
Run print_model_eval(con_matrix) to evaluate the model, using micro and macro averaging and print the evaluation

Load all data from data/all.csv which combined both data from data/train.csv & data/test.csv and store it into dataframe (df_all)
Add headers to df_all
Run plot_qq(df_all, pose, remove=True) to plot QQ plot for each attribute (x1 to y11) for given pose
Choose pose from [mountain, downnwarddog, childs]

Read train and test data from data/train.csv and data/test.csv and convert them to dataframe (df_train & df_test)
Run preprocess(df_train, df_test) to preprocess data and convert df_test into numpy array (np_test)
Run predict_kde(np_test, df_train, SIGMA=i) 2 times (sigma = 0.1 and sigma = 5) with a for loop
It will also run get_con_matrix(results, poses) to get the confusion matrix for each result and print them

Run this cell to plot pdf for gaussian & kde (with given sigma) for train dataset
Repeat with sigma=0.1 and sigma=0.5

Read train and test data from data/train.csv and data/test.csv and convert them to dataframe (df_train & df_test)
Run predict_kde_rs(df, num) to run KDE Naive Bayes prediction random holdout using random holdout with given num. 5 is used here.
The result for each prediction will be printed out