tensorflow/decision-forests

Sample numerical uplift

vitorsrg opened this issue · 3 comments

Hi! Do you have a working snippet of uplift mode? I couldn't find any.

I've tried this very simple implementation:

import tensorflow_decision_forests as tfdf
import pandas as pd

df = pd.DataFrame(
    [
        [0.1, 0.2, 0.3],
        [0.4, 0.5, 0.6],
        [0.7, 0.8, 0.9],
    ],
    columns=list("abc"))
ds = tfdf.keras.pd_dataframe_to_tf_dataset(
    df,
    label="a",
    task=tfdf.keras.Task.NUMERICAL_UPLIFT)
model = tfdf.keras.GradientBoostedTreesModel(
    task=tfdf.keras.Task.NUMERICAL_UPLIFT,
    uplift_treatment="b")
model.fit(ds)

However, it fails with the following output:

2023-06-20 01:54:38.103514: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Use /var/folders/wn/x8pf0vlx58j_291rhkkk1p640000gn/T/tmpq7bwoiel as temporary training directory
[WARNING 23-06-20 01:54:43.2767 -03 gradient_boosted_trees.cc:1797] "goss_alpha" set but "sampling_method" not equal to "GOSS".
[WARNING 23-06-20 01:54:43.2789 -03 gradient_boosted_trees.cc:1808] "goss_beta" set but "sampling_method" not equal to "GOSS".
[WARNING 23-06-20 01:54:43.2789 -03 gradient_boosted_trees.cc:1822] "selective_gradient_boosting_ratio" set but "sampling_method" not equal to "SELGB".
Reading training dataset...
2023-06-20 01:54:43.317147: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype double and shape [3]
         [[{{node Placeholder/_2}}]]
Training dataset read in 0:00:04.492206. Found 3 examples.
Training model...
2023-06-20 01:54:47.804256: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at kernel_long_process.cc:152 : UNKNOWN: TensorFlow: INVALID_ARGUMENT: No defined default loss for this combination of label type and task
Traceback (most recent call last):
  File "./tfdf.py", line 19, in <module>
    model.fit(ds)
  File ".../tensorflow_decision_forests/keras/core.py", line 1257, in fit
    return self._fit_implementation(
  File ".../tensorflow_decision_forests/keras/core.py", line 1614, in _fit_implementation
    self._train_model(cluster_coordinator=coordinator)
  File ".../tensorflow_decision_forests/keras/core.py", line 2090, in _train_model
    tf_core.train(
  File ".../tensorflow_decision_forests/tensorflow/core.py", line 568, in train
    training_op.SimpleMLCheckStatus(process_id=process_id) == 1
  File ".../tensorflow/python/util/tf_export.py", line 413, in wrapper
    return f(**kwargs)
  File "<string>", line 1371, in simple_ml_check_status
  File ".../tensorflow/python/framework/ops.py", line 7262, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.UnknownError: {{function_node __wrapped__SimpleMLCheckStatus_device_/job:localhost/replica:0/task:0/device:CPU:0}} TensorFlow: INVALID_ARGUMENT: No defined default loss for this combination of label type and task [Op:SimpleMLCheckStatus]
rstz commented

Just as a quick update: A tutorial is in the works and we've also also worked on improving the error messages for the next version.

Your example does not work because

  • Uplifting is only supported for Random Forest Models
  • The treatment column needs to be a 0 or 1 variable (does not have treatment or has treatment)

Your example would therefore work when changing it to

import tensorflow_decision_forests as tfdf
import pandas as pd

df = pd.DataFrame(
    [
        [0.1, 1, 0.3],
        [0.4, 1, 0.6],
        [0.7, 0, 0.9],
        [1.0, 0, 0.9],
    ],
    columns=list("abc"))
ds = tfdf.keras.pd_dataframe_to_tf_dataset(
    df,
    label="a",
    task=tfdf.keras.Task.NUMERICAL_UPLIFT)
model = tfdf.keras.RandomForestModel(
    task=tfdf.keras.Task.NUMERICAL_UPLIFT,
    uplift_treatment="b")
model.fit(ds)

But stay tuned for a full tutorial :)

Hi, thanks for the reply

The treatment column needs to be a 0 or 1 variable (does not have treatment or has treatment)

I was expecting NUMERICAL and CATEGORICAL uplift to have continuous and categorical/discrete treatment respectively. It would be helpful to have their differences and use cases explicit in the tutorial then

rstz commented

I was expecting NUMERICAL and CATEGORICAL uplift to have continuous and categorical/discrete treatment respectively

That's good to know, I've added a paragraph in the tutorial about the difference: Numerical and Categorical indeed specify the type of outcome, not the type of treatment in the problem.

The tutorial is now available in documentation/tutorials/uplift_colab.ipynb and will be available on the Tensorflow website once that's updated (a few days?)
Happy to hear feedback!