implementation of PCC
ChristianSch opened this issue · 3 comments
ChristianSch commented
reference implementations include:
ChristianSch commented
My first implementation gets the following stats:
hamming_loss: 0.20202020202020202
zero_one_loss: 0.6868686868686869
whereas molearn.PCC
achieves the following:
hamming_loss: 0.22643097643097643
zero_one_loss: 0.6818181818181819
Both use the same training/test data and splits, both use sklearn.linear_model.LinearRegression
as base_classifier
. The losses however should be the same.
ChristianSch commented
Reproduction:
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> from skml.problem_transformation import ProbabilisticClassifierChain
>>> from skml.datasets import load_dataset
>>> from molearn.classifiers.classifier_chains import PCC
>>> X, y = load_dataset('yeast')
>>>
>>> y = y[:,:6]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)
>>>
>>> pcc = ProbabilisticClassifierChain(LogisticRegression())
>>> pccmo = PCC(LogisticRegression())
>>> pcc.fit(X_train, y_train)
>>> pccmo.fit(X_train, y_train)
<molearn.classifiers.classifier_chains.PCC object at 0x7fb321dc9eb8>
>>> y_pred = pcc.predict(X_test)
>>> y_predmo = pccmo.predict(X_test)
>>> from sklearn.metrics import hamming_loss
>>> from sklearn.metrics import zero_one_loss
>>> hamming_loss(y_pred, y_test)
0.262534435261708
>>> zero_one_loss(y_pred, y_test)
0.6942148760330579
>>> hamming_loss(y_predmo, y_test)
0.27823691460055094
>>> zero_one_loss(y_predmo, y_test)
0.6512396694214876
ChristianSch commented
With the latest iteration of CC the following losses can be obtained:
>>> hamming_loss(y_pred, y_test)
0.27520661157024795
>>> hamming_loss(y_predmo, y_test)
0.27520661157024795
>>> zero_one_loss(y_pred, y_test)
0.631404958677686
>>> zero_one_loss(y_predmo, y_test)
0.631404958677686
Hence our PCC implementation works as expected now.