Sample numerical uplift
vitorsrg opened this issue · 3 comments
Hi! Do you have a working snippet of uplift mode? I couldn't find any.
I've tried this very simple implementation:
import tensorflow_decision_forests as tfdf
import pandas as pd
df = pd.DataFrame(
[
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9],
],
columns=list("abc"))
ds = tfdf.keras.pd_dataframe_to_tf_dataset(
df,
label="a",
task=tfdf.keras.Task.NUMERICAL_UPLIFT)
model = tfdf.keras.GradientBoostedTreesModel(
task=tfdf.keras.Task.NUMERICAL_UPLIFT,
uplift_treatment="b")
model.fit(ds)
However, it fails with the following output:
2023-06-20 01:54:38.103514: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Use /var/folders/wn/x8pf0vlx58j_291rhkkk1p640000gn/T/tmpq7bwoiel as temporary training directory
[WARNING 23-06-20 01:54:43.2767 -03 gradient_boosted_trees.cc:1797] "goss_alpha" set but "sampling_method" not equal to "GOSS".
[WARNING 23-06-20 01:54:43.2789 -03 gradient_boosted_trees.cc:1808] "goss_beta" set but "sampling_method" not equal to "GOSS".
[WARNING 23-06-20 01:54:43.2789 -03 gradient_boosted_trees.cc:1822] "selective_gradient_boosting_ratio" set but "sampling_method" not equal to "SELGB".
Reading training dataset...
2023-06-20 01:54:43.317147: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype double and shape [3]
[[{{node Placeholder/_2}}]]
Training dataset read in 0:00:04.492206. Found 3 examples.
Training model...
2023-06-20 01:54:47.804256: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at kernel_long_process.cc:152 : UNKNOWN: TensorFlow: INVALID_ARGUMENT: No defined default loss for this combination of label type and task
Traceback (most recent call last):
File "./tfdf.py", line 19, in <module>
model.fit(ds)
File ".../tensorflow_decision_forests/keras/core.py", line 1257, in fit
return self._fit_implementation(
File ".../tensorflow_decision_forests/keras/core.py", line 1614, in _fit_implementation
self._train_model(cluster_coordinator=coordinator)
File ".../tensorflow_decision_forests/keras/core.py", line 2090, in _train_model
tf_core.train(
File ".../tensorflow_decision_forests/tensorflow/core.py", line 568, in train
training_op.SimpleMLCheckStatus(process_id=process_id) == 1
File ".../tensorflow/python/util/tf_export.py", line 413, in wrapper
return f(**kwargs)
File "<string>", line 1371, in simple_ml_check_status
File ".../tensorflow/python/framework/ops.py", line 7262, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.UnknownError: {{function_node __wrapped__SimpleMLCheckStatus_device_/job:localhost/replica:0/task:0/device:CPU:0}} TensorFlow: INVALID_ARGUMENT: No defined default loss for this combination of label type and task [Op:SimpleMLCheckStatus]
Just as a quick update: A tutorial is in the works and we've also also worked on improving the error messages for the next version.
Your example does not work because
- Uplifting is only supported for Random Forest Models
- The treatment column needs to be a 0 or 1 variable (does not have treatment or has treatment)
Your example would therefore work when changing it to
import tensorflow_decision_forests as tfdf
import pandas as pd
df = pd.DataFrame(
[
[0.1, 1, 0.3],
[0.4, 1, 0.6],
[0.7, 0, 0.9],
[1.0, 0, 0.9],
],
columns=list("abc"))
ds = tfdf.keras.pd_dataframe_to_tf_dataset(
df,
label="a",
task=tfdf.keras.Task.NUMERICAL_UPLIFT)
model = tfdf.keras.RandomForestModel(
task=tfdf.keras.Task.NUMERICAL_UPLIFT,
uplift_treatment="b")
model.fit(ds)
But stay tuned for a full tutorial :)
Hi, thanks for the reply
The treatment column needs to be a 0 or 1 variable (does not have treatment or has treatment)
I was expecting NUMERICAL and CATEGORICAL uplift to have continuous and categorical/discrete treatment respectively. It would be helpful to have their differences and use cases explicit in the tutorial then
I was expecting NUMERICAL and CATEGORICAL uplift to have continuous and categorical/discrete treatment respectively
That's good to know, I've added a paragraph in the tutorial about the difference: Numerical and Categorical indeed specify the type of outcome, not the type of treatment in the problem.
The tutorial is now available in documentation/tutorials/uplift_colab.ipynb and will be available on the Tensorflow website once that's updated (a few days?)
Happy to hear feedback!