COMP30027 Machine Learning Assignment 1

Group Members

  1. Tom Zhi Hern 1068268
  2. Peter Qian Ziyu 1067810

=====Naive Bayes Instructions=====

  1. Read train and test data from data/train.csv and data/test.csv and convert them to dataframe (df_train & df_test)
  2. Run preprocess(df_train, df_test) to preprocess data and convert df_test into numpy array (np_test)
  3. Run train(df_train) to get statistics from train data (prior probability, mean, stdv) and store it into pose_dict
  4. Run predict(np_test, pose_dict) to predict the results and store the results into list (results)
  5. Run evaluate(results) to get the accuracy of the model

Question 1

  1. Run get_con_matrix(results, poses) to get confusion matrix (con_matrix) for the result obtained in the previous section
  2. Run print_model_eval(con_matrix) to evaluate the model, using micro and macro averaging and print the evaluation

Question 2

  1. Load all data from data/all.csv which combined both data from data/train.csv & data/test.csv and store it into dataframe (df_all)
  2. Add headers to df_all
  3. Run plot_qq(df_all, pose, remove=True) to plot QQ plot for each attribute (x1 to y11) for given pose
  4. Choose pose from [mountain, downnwarddog, childs]

Question 3

Run cell_q3a then cell_q3b

cell_q3a

  1. Read train and test data from data/train.csv and data/test.csv and convert them to dataframe (df_train & df_test)
  2. Run preprocess(df_train, df_test) to preprocess data and convert df_test into numpy array (np_test)
  3. Run predict_kde(np_test, df_train, SIGMA=i) 2 times (sigma = 0.1 and sigma = 5) with a for loop
  4. It will also run get_con_matrix(results, poses) to get the confusion matrix for each result and print them

cell_q3b

  1. Run this cell to plot pdf for gaussian & kde (with given sigma) for train dataset
  2. Repeat with sigma=0.1 and sigma=0.5

Question 4

  1. Read train and test data from data/train.csv and data/test.csv and convert them to dataframe (df_train & df_test)
  2. Run predict_kde_rs(df, num) to run KDE Naive Bayes prediction random holdout using random holdout with given num. 5 is used here.
  3. The result for each prediction will be printed out