Machine Learning Exercise

exercise of ML course in TUD, exercise description and sildes click here.

week7/8 : evalution

including :

P1.py
data

The result as following :

P1 : learning curve :

P1 : ROC-curve ( my implementation & scikit-learn implementation ) :

week6 : Naive Bayes

edit :

thanks a lot to Professor Heidrich, he pointed out that I used the variance instead of standard deviation when calculating the gaussian function. So the result seems a little different.

besides, we took a look at the solution, find there is also a mistake about gaussian function:

 return 1/np.sqrt(2 * pi * var) * np.exp(-(x-mean)**2/(2*var))    #edited code
 #return 1/sqrt(variance * 2 * pi) * exp(-(x - mean)**2/(2*var))  # original code

result of the edited code:

including :

P1.py
P3.py

The result as following :

P1 : simulation of tossing dice :
- theory calculate : { 1 : 1/7776 , 2 : 31/7776 , 3 : 211/7776 , 4 : 781/77776 , 5 : 2101/7776 , 6 : 4651/7776 }

P3 : Naive Bayes classifier in Data3 ( my implementation & scikit-learn implementation ) :

week5 : Random Forest

including :

random_forest.py
random_forest_SKT.py
TODO : DecisionTree_pruning.py : I think it is impossible to implement a pruning method in decision tree consisted of list. If we want to prune a decision tree, this tree must be able to be modified during the iteration, which can't be realized with a list. A better data struct is "class". Or anyone has a idea with list? if you do, pls tell me :-)

The result as following :

random forest : my implementation (without continuous feature):

random forest : scikit learn package's implementation(with continuous feature):

week4 : Decision Tree with continuous value

edit :

after reading the document of scikit_learn package, I implement the DT_skt.py again with a more simple style, code click here
but the result seems the same as before, unfortunately :-(

including :

The result as following :

P1 DecisionTree_cont_var : my implementation:

P2 DecisionTree : scikit-learn implementation:

P3 DecisionTree : with gain ratio, on the "Data3.csv":

week3 : Decision Tree

edit:

I took a mistake : in my original code, every time I produce a node I set a feature as "false", so that the final decision tree is small and with lower accuracy.
so we should change code in dtree_learning(examples, attr_avail, default, cls_index) in line 133. As follow:

    #attr_avail[best_attr_index] = False # original code
    new_attr_avail = attr_avail[:best_attr_index]+[False]+attr_avail[best_attr_index+1:]   #edited code

chrisHuxi/MachineLearningExercise

Machine Learning Exercise

week7/8 : evalution

including :

The result as following :

week6 : Naive Bayes

edit :

including :

The result as following :

week5 : Random Forest

including :

The result as following :

week4 : Decision Tree with continuous value

edit :

including :

The result as following :

week3 : Decision Tree

edit:

including :

The result as following :

week2 : visualize

including :

The result as following :