'use_predefined_hps' auto-tuning failure for RANKING tasks

Question

'use_predefined_hps' auto-tuning failure for RANKING tasks

changhyunlee806 opened this issue 2 years ago · 7 comments

There seems to be a bug with 'use_predefined_hps' automatic tuning for RANKING tasks.
Sampling method SELGB gives an invalid argument error.
I have reproduced error using Google Collab stock example script.
https://colab.research.google.com/drive/1CItiBTm4jN7TKTsg5hozpg3T0f0FiLXD?usp=sharing

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, but issue persists on stock example code.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): posix, Linux 5.10.147+
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on a mobile device: N/A
TensorFlow version (use command below): 1.2.0
TensorFlow Decision Forest version: 1.3.0
Python version: 3.9.16
CUDA/cuDNN version: N/A
GPU model and memory: N/A

Below is the error log:

Answer 1 · 2023-04-03T17:31:06.000Z

Hi, thank you for the detailed report! This looks like a bug indeed, a fix is in our pipeline.

Answer 2 · 2023-04-05T16:13:30.000Z

FYI: The bug has been fixed and the fix will be included in the next version of TF-DF. I haven't looked into workarounds, is this something you need?

Answer 3 · 2023-04-05T17:37:21.000Z

Thank you for the fix, and I will look forward to the next version of TF-DF. In terms of workarounds, I have manually configured the tuner based on the hyperparameter log submitted to the Vertex AI job before the trainer throws the error. However, this means that I have excluded the hyperparameter that throws errors. Let me know if you have a better idea as a workaround!

Answer 4 · 2023-04-14T17:36:14.000Z

Thank you for raising this issue. I am experiencing the same problem with no workaround.

Answer 5 · 2023-04-14T21:39:36.000Z

The best workaround would be to define the hyper-parameters yourself. The quick-and-dirty way to do this is to just look up the proto used for the search space and hack it into TF-DF, see code below. Let me know if this is useful.

Context: At this point we've entered the world of configuring TF-DF using protobuf messages. This is the most flexible way to configure TF-DF (or, more precisely, its backend Yggdrasil Decision Forests), but it's clearly not very convenient. Since the proto below just overrides the entire training configuration, all other options you may want to set (such as num_trials) must also be changed in the textproto field. You can find the proto definitions here (for the main TrainingConfig) and here (for the HyperparameterTuner extension). In the code below, I've removed the problematic option from the TrainingConfig manually.

Maybe someone can also translate the proto into a nice combination of TF-DF settings?

from google.protobuf import text_format

textproto = """
learner: "HYPERPARAMETER_OPTIMIZER"
[yggdrasil_decision_forests.model.hyperparameters_optimizer_v2.proto.hyperparameters_optimizer_config] {
  base_learner {
    learner: "GRADIENT_BOOSTED_TREES"
  }
  optimizer {
    optimizer_key: "RANDOM"
    [yggdrasil_decision_forests.model.hyperparameters_optimizer_v2.proto.random] {
      num_trials: 10
    }
  }
  base_learner_deployment {
    num_threads: 1
  }
  search_space {
    fields   {
      name: "split_axis"
      discrete_candidates {
        possible_values {
          categorical: "AXIS_ALIGNED"
        }
        possible_values {
          categorical: "SPARSE_OBLIQUE"
        }
      }
      children {
        name: "sparse_oblique_projection_density_factor"
        discrete_candidates {
          possible_values {
            real: 1
          }
          possible_values {
            real: 2
          }
          possible_values {
            real: 3
          }
          possible_values {
            real: 4
          }
          possible_values {
            real: 5
          }
        }
        parent_discrete_values {
          possible_values {
            categorical: "SPARSE_OBLIQUE"
          }
        }
      }
      children {
        name: "sparse_oblique_normalization"
        discrete_candidates {
          possible_values {
            categorical: "NONE"
          }
          possible_values {
            categorical: "STANDARD_DEVIATION"
          }
          possible_values {
            categorical: "MIN_MAX"
          }
        }
        parent_discrete_values {
          possible_values {
            categorical: "SPARSE_OBLIQUE"
          }
        }
      }
      children {
        name: "sparse_oblique_weights"
        discrete_candidates {
          possible_values {
            categorical: "BINARY"
          }
          possible_values {
            categorical: "CONTINUOUS"
          }
        }
        parent_discrete_values {
          possible_values {
            categorical: "SPARSE_OBLIQUE"
          }
        }
      }
    }
    fields {
      name: "categorical_algorithm"
      discrete_candidates {
        possible_values {
          categorical: "CART"
        }
        possible_values {
          categorical: "RANDOM"
        }
      }
    }
    fields {
      name: "growing_strategy"
      discrete_candidates {
        possible_values {
          categorical: "LOCAL"
        }
        possible_values {
          categorical: "BEST_FIRST_GLOBAL"
        }
      }
      children {
        name: "max_num_nodes"
        discrete_candidates {
          possible_values {
            integer: 16
          }
          possible_values {
            integer: 32
          }
          possible_values {
            integer: 64
          }
          possible_values {
            integer: 128
          }
          possible_values {
            integer: 256
          }
          possible_values {
            integer: 512
          }
        }
        parent_discrete_values {
          possible_values {
            categorical: "BEST_FIRST_GLOBAL"
          }
        }
      }
      children {
        name: "max_depth"
        discrete_candidates {
          possible_values {
            integer: 3
          }
          possible_values {
            integer: 4
          }
          possible_values {
            integer: 6
          }
          possible_values {
            integer: 8
          }
        }
        parent_discrete_values {
          possible_values {
            categorical: "LOCAL"
          }
        }
      }
    }
    fields {
      name: "sampling_method"
      discrete_candidates {
        possible_values {
          categorical: "RANDOM"
        }
      }
      children {
        name: "subsample"
        discrete_candidates {
          possible_values {
            real: 0.6
          }
          possible_values {
            real: 0.8
          }
          possible_values {
            real: 0.9
          }
          possible_values {
            real: 1
          }
        }
        parent_discrete_values {
          possible_values {
            categorical: "RANDOM"
          }
        }
      }
      children {
        name: "selective_gradient_boosting_ratio"
        discrete_candidates {
          possible_values {
            real: 0.01
          }
          possible_values {
            real: 0.05
          }
          possible_values {
            real: 0.1
          }
          possible_values {
            real: 0.2
          }
          possible_values {
            real: 1
          }
        }
        parent_discrete_values {
          possible_values {
            categorical: "SELGB"
          }
        }
      }
    }
    fields {
      name: "shrinkage"
      discrete_candidates {
        possible_values {
          real: 0.02
        }
        possible_values {
          real: 0.05
        }
        possible_values {
          real: 0.1
        }
      }
    }
    fields {
      name: "min_examples"
      discrete_candidates {
        possible_values {
          integer: 5
        }
        possible_values {
          integer: 7
        }
        possible_values {
          integer: 10
        }
        possible_values {
          integer: 20
        }
      }
    }
    fields {
      name: "use_hessian_gain"
      discrete_candidates {
        possible_values {
          categorical: "true"
        }
        possible_values {
          categorical: "false"
        }
      }
    }
    fields {
      name: "num_candidate_attributes_ratio"
      discrete_candidates {
        possible_values {
          real: 0.2
        }
        possible_values {
          real: 0.5
        }
        possible_values {
          real: 0.9
        }
        possible_values {
          real: 1
        }
      }
    }
  }
}
"""
training_config = tfdf.tuner.abstract_learner_pb2.TrainingConfig()
text_format.Parse(textproto, proto)
tuner = tfdf.tuner.RandomSearch(num_trials=10, use_predefined_hps=True)
tuner._train_config = proto
model = tfdf.keras.GradientBoostedTreesModel(
    task=tfdf.keras.Task.RANKING,
    ranking_group="group",
    num_trees=50,
    tuner=tuner)
model.fit(dataset_ds)

Answer 6 · 2023-04-20T09:00:13.000Z

Finally got time to write down a nicer workaround. This generates the same set of options as use_predefined_hps=True, expect for the one setting causing the issue.

tuner = tfdf.tuner.RandomSearch(num_trials=10)
tuner.choice("split_axis", ["AXIS_ALIGNED"])
oblique_space = tuner.choice("split_axis", ["SPARSE_OBLIQUE"], merge=True)
oblique_space.choice("sparse_oblique_projection_density_factor", [1.0, 2.0, 3.0, 4.0, 5.0])
oblique_space.choice("sparse_oblique_normalization",
                     ["NONE", "STANDARD_DEVIATION", "MIN_MAX"])
oblique_space.choice("sparse_oblique_weights", ["BINARY", "CONTINUOUS"])

tuner.choice("categorical_algorithm", ["CART", "RANDOM"])
local_growing_strategy = tuner.choice("growing_strategy", ["LOCAL"])
best_first_global_growing_strategy = tuner.choice("growing_strategy", ["BEST_FIRST_GLOBAL"], merge=True)
best_first_global_growing_strategy.choice("max_num_nodes", [16,32,64,128,256,512])
local_growing_strategy.choice("max_depth", [3,4,6,8])
random_sampling = tuner.choice("sampling_method", ["RANDOM"])
random_sampling.choice("subsample", [0.6, 0.8, 0.9, 1.0])
tuner.choice("shrinkage", [0.02, 0.05, 0.1])
tuner.choice("min_examples", [5,7,10,20])
tuner.choice("use_hessian_gain", ["true", "false"])
tuner.choice("num_candidate_attributes_ratio", [0.2, 0.5, 0.9, 1.0])

Since this issue seems to impact multiple people, I'll keep it open for better discoverability until the next TF-DF version is released.

Answer 7 · 2023-07-04T11:15:49.000Z

Fixed with today's 1.4.0 release