parrt/dtreeviz

Color keyword argument - Value error

Opened this issue · 18 comments

I am working on a binary classification problem using lightGBM.
The model was trained on 42 features. The training dataset size is (78000, 42) - 78000 observations spanning across 42 features
The test dataset size is (25220, 42)

Using dtreeviz on my trained model:

viz = dtreeviz.model(gm, tree_index = 0, X_train = X_train, y_train=Y_train, feature_names = features, target_name="A", class_names = ["A", "B"])

When I execute viz.view() I am facing the following error:
ValueError: The 'color' keyword argument must have one color per dataset, but 1 datasets and 0 colors were provided

Any thoughts on how to go about this?

Is it something similar with #280?

I am facing the same error.

Here is the image which contains some details:

image

@tlapusan Yes, the error description is the same as the one posted by @baligoyem

Have you ever faced the AttributeError, which its description is 'Rectangle' object has no attribute 'patches'?

I am asking this question because I have sometimes randomly faced these two errors that are related to each other, I believe.

did you try the latest version of dtreeviz ?

yes, I did. But it did not resolve.

Using colour-0.1.5 and dtreeviz-2.2.1 .. No luck at all

could you provide a google collab or any kind of shareable notebook so I could reproduce your issue?

@tlapusan Sorry for responding this late. Unfortunately, I can't share the notebook as the data and features used is confidential :(

+1

+1

windyd commented

In my case (dtreeviz=2.2.2), it seems to be a precision problem from the get_thresholds method. If you have small float thresholds, samples are assigned to wrong paths. In some cases, some nodes may end up with no samples.

class ShadowLightGBMTree(ShadowDecTree):
    ...
    def get_thresholds(self) -> np.ndarray:
        if self.thresholds is not None:
            return self.thresholds

        node_thresholds = [-1] * self.nnodes()
        for i in range(self.nnodes()):
            if self.children_left[i] != -1 and self.children_right[i] != -1:
                if self.is_categorical_split(i):
                    node_thresholds[i] = list(map(int, self.tree_nodes[i]["threshold"].split("||")))
                else:
                    ###  thresholds are ROUNDED!
                    node_thresholds[i] = round(self.tree_nodes[i]["threshold"], 2)

        self.thresholds = np.array(node_thresholds, dtype=object)
        return self.thresholds

No sample -> No color mapped -> this problem.

+1 on 2.2.2, any workarounds?

I dealt with this issue. You should ensure that the data you use to train the LGBM model is the same as the data for visualization.

I am using the same data for visualization and I still get the error. Not sure, how to fix it.

+1 v2.2.2