Machine Learning Exercise

exercise of ML course in TUD, exercise description and sildes click here.

week7/8 : evalution

including :

The result as following :

  • P1 : learning curve :

  • P1 : ROC-curve ( my implementation & scikit-learn implementation ) :

week6 : Naive Bayes

edit :

  • thanks a lot to Professor Heidrich, he pointed out that I used the variance instead of standard deviation when calculating the gaussian function. So the result seems a little different.
  • besides, we took a look at the solution, find there is also a mistake about gaussian function:
     return 1/np.sqrt(2 * pi * var) * np.exp(-(x-mean)**2/(2*var))    #edited code
     #return 1/sqrt(variance * 2 * pi) * exp(-(x - mean)**2/(2*var))  # original code
  • result of the edited code:

including :

The result as following :

  • P1 : simulation of tossing dice :

    • theory calculate : { 1 : 1/7776 , 2 : 31/7776 , 3 : 211/7776 , 4 : 781/77776 , 5 : 2101/7776 , 6 : 4651/7776 }

  • P3 : Naive Bayes classifier in Data3 ( my implementation & scikit-learn implementation ) :

week5 : Random Forest

including :

  • random_forest.py
  • random_forest_SKT.py
  • TODO : DecisionTree_pruning.py : I think it is impossible to implement a pruning method in decision tree consisted of list. If we want to prune a decision tree, this tree must be able to be modified during the iteration, which can't be realized with a list. A better data struct is "class". Or anyone has a idea with list? if you do, pls tell me :-)

The result as following :

  • random forest : my implementation (without continuous feature):

  • random forest : scikit learn package's implementation(with continuous feature):

week4 : Decision Tree with continuous value

edit :

  • after reading the document of scikit_learn package, I implement the DT_skt.py again with a more simple style, code click here
  • but the result seems the same as before, unfortunately :-(

including :

The result as following :

  • P1 DecisionTree_cont_var : my implementation:

  • P2 DecisionTree : scikit-learn implementation:

  • P3 DecisionTree : with gain ratio, on the "Data3.csv":

week3 : Decision Tree

edit:

  • I took a mistake : in my original code, every time I produce a node I set a feature as "false", so that the final decision tree is small and with lower accuracy.
  • so we should change code in dtree_learning(examples, attr_avail, default, cls_index) in line 133. As follow:
    #attr_avail[best_attr_index] = False # original code
    new_attr_avail = attr_avail[:best_attr_index]+[False]+attr_avail[best_attr_index+1:]   #edited code

including :

The result as following :

week2 : visualize

including :

The result as following :