tensorflow/decision-forests

Models trained on pure 1's predict 0

NikolajSafty opened this issue · 3 comments

Hi,

I'm currently facing an issue with models being trained on data where the label is always the same value. I'd expect the fitted model to produce a model the same value as in the training dataset, but I'm only getting 0's when predicting.

Here's an example to demonstrate:

import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow_decision_forests as tfdf

# Generate dummy data
data = {
    'fruit': ['apple'] * 100 + ['banana'] * 100,
    'eatable': [1]*200
}

# Read and split 
df = pd.DataFrame.from_dict(data)
train, test = train_test_split(df, random_state=0)
tf_train = tfdf.keras.pd_dataframe_to_tf_dataset(train, label="eatable")
tf_test = tfdf.keras.pd_dataframe_to_tf_dataset(test, label="eatable")

# Instantiate af model and fit 
model = tfdf.keras.CartModel()
model.fit(tf_train)

# Count 1's from prediction
model.predict(tf_test).sum()

The results are pure 0's. I'd expect the model to predict pure 1's, though.

rstz commented

Hi, thank you for reporting this. This is a bug and we're actively working on fixing it in the next days.

Thanks for the quick feedback, sounds good!

rstz commented

Confirming that this is now fixed at head and the fix will be included in the next version