aia-uclouvain/pydl8.5

How does dl8.5 work with more than 2 classes?

Closed this issue · 3 comments

Hi,

I'd love to use pydl8.5 with one of my projects. However, it is not immediately obvious how to binarize the dataset or how to use the decision trees for more than 2 classes. Am I missing something or is it not possible to use dl8.5 for more than 2 classes?

Hello,

For the binarization, you can use one-hot encoding for categorical datasets. For numerical datasets, you can create binary features representing whether the feature values are lower than or equal to each original feature value. For instance, for a feature A with 5 instances {1,2,2,3,4}, this will produce 3 binary features: A ≤ 1 or not, A ≤ 2 or not, A ≤ 3 or not.
These binarization techniques will preserve the optimality. However, they will increase the number of features and hence the time required to learn the decision tree.

For the problems with more than 2 classes, DL8.5 natively handles the case by predicting the majority class and using as error the number of instances not associated to the majority class. If you define your own error function, you can compute the error and the class in the same way.

Nice, so I don't have to binarize manually? Pydl8.5 will handle it on its own?

You have to binarize your data before running PyDL8.5. I described you in the first paragraph of my answer, how you can do this.