exercise of ML course in TUD, exercise description and sildes click here.
- P1 : learning curve :
- P1 : ROC-curve ( my implementation & scikit-learn implementation ) :
- thanks a lot to Professor Heidrich, he pointed out that I used the variance instead of standard deviation when calculating the gaussian function. So the result seems a little different.
- besides, we took a look at the solution, find there is also a mistake about gaussian function:
return 1/np.sqrt(2 * pi * var) * np.exp(-(x-mean)**2/(2*var)) #edited code #return 1/sqrt(variance * 2 * pi) * exp(-(x - mean)**2/(2*var)) # original code
- result of the edited code:
-
P1 : simulation of tossing dice :
- theory calculate : { 1 : 1/7776 , 2 : 31/7776 , 3 : 211/7776 , 4 : 781/77776 , 5 : 2101/7776 , 6 : 4651/7776 }
- P3 : Naive Bayes classifier in Data3 ( my implementation & scikit-learn implementation ) :
- random_forest.py
- random_forest_SKT.py
TODO : DecisionTree_pruning.py
: I think it is impossible to implement a pruning method in decision tree consisted of list. If we want to prune a decision tree, this tree must be able to be modified during the iteration, which can't be realized with a list. A better data struct is "class". Or anyone has a idea with list? if you do, pls tell me :-)
- random forest : my implementation (without continuous feature):
- random forest : scikit learn package's implementation(with continuous feature):
- after reading the document of scikit_learn package, I implement the DT_skt.py again with a more simple style, code click here
- but the result seems the same as before, unfortunately :-(
- P1 DecisionTree_cont_var : my implementation:
- P2 DecisionTree : scikit-learn implementation:
- P3 DecisionTree : with gain ratio, on the "Data3.csv":
- I took a mistake : in my original code, every time I produce a node I set a feature as "false", so that the final decision tree is small and with lower accuracy.
- so we should change code in
dtree_learning(examples, attr_avail, default, cls_index)
in line 133. As follow:
#attr_avail[best_attr_index] = False # original code
new_attr_avail = attr_avail[:best_attr_index]+[False]+attr_avail[best_attr_index+1:] #edited code
-
result of new code :
-
reference : code from Professor Guthier
-
P1 gaussian distribution:
-
P2 visualize :