aia-uclouvain/pydl8.5

Getting feature names in the Classifier output?

Closed this issue · 5 comments

Hello,
I discretized my diabetes dataset, then binarized it and have 80 features
X_train.columns.values
array(["'preg=\'(-inf-1.7]\''", "'preg=\'(1.7-3.4]\''",
"'preg=\'(3.4-5.1]\''", "'preg=\'(5.1-6.8]\''",....])

When I use them in the ODT Classifier
clf = DL85Classifier(max_depth=2, time_limit=600, desc=True, print_output=True)
start = time.perf_counter()
print("Model building...")
clf.fit(X_train, y_train)

I get the output

DL8.5 fitting: Solution found
(nItems, nTransactions) : ( 160, 614 )
Tree: {'feat': 18, 'left': {'feat': 77, 'left': {'value': 0, 'error': 0.0}, 'right': {'value': 1, 'error': 7.0}}, 'right': {'feat': 19, 'left': {'value': 1, 'error': 4.0}, 'right': {'value': 0, 'error': 155.0}}}
Size: 7
Depth: 2
Error: 166.0
LatticeSize: 4770
Runtime: 0.018403
Timeout: False

How do I map the features 'feat':77 etc. to real names?

DL85Classifier considers the X_train matrix without names. Thus, the column indices are considered as names. For a matrix with n columns, the names range from 0 to n-1. The order of columns is kept. The first column is considered as feature_0 and is output as 'feat': 0 in the result. In the same way, the last one is output as 'feat': n-1 in the results.

In your case, to map the real names, you have to use the variable X_train.columns.values
'feat': 0 corresponds to the X_train.columns.values[0] ('preg='(-inf-1.7]'')
...
'feat': n-1 corresponds to the X_train.columns.values[n-1]

Thanks. Also, how do I interpret left and right from the parent?

{'feat': 18, 'left': {'feat': 77, 'left': {'value': 0, 'error': 0.0}, 'right': {'value': 1, 'error': 7.0}}, 'right': {'feat': 19, 'left': {'value': 1, 'error': 4.0}, 'right': {'value': 0, 'error': 155.0}}}

Feature 18 is binary with {0,1}, so the left tree that further uses Feature 77 is out of Feature 18 value 0 or 1? The right is feature 19 but coming out of Feature 18 0 or 1?

Thanks

The 'left' node is associated to the parent's feature value 1 while the 'right' one is associated to 0.

Thanks. Was confused as for one of the branches that yield a leaf it was not the case
{'feat': 77, 'left': {'value': 0, 'error': 0.0}, 'right': {'value': 1, 'error': 7.0}}

The attribute 'value' in the json string is not related to a feature value. It represents instead the prediction (the class predicted by the leaf). Notice that only leaf nodes have this attribute. The 'error' attribute is also available for leaf nodes. It denotes the number of instances misclassified by this leaf in the training data.