Improving performance
Closed this issue · 1 comments
Hey there! So sorry to bother you for such a specific problem. Say I have a dataset of
latitude
longitude
regression_target
(predicting %nitrogen in soil)- (other things like string categories, etc.)
I tried using GPBoost
hoping that it would perform better than a naive LightGBM, but I'm finding that the two are basically on-par with one another. To be fair, I only have 70 data points so perhaps GPBoost hasn't had the opportunity to shine.
I'm hyperparameter searching over
{
"num_leaves": {
"values": list(range(10, 101, 10)),
},
"n_estimators": {
"values": list(range(100, 1001, 100)),
},
"learning_rate": {
"values": list(el / 1000 for el in range(1, 100)) + list(el/ 10 for el in range(1, 20))
},
"max_depth": {
"values": list(range(4, 100, 5)),
},
"lambda_l1": {
"values": [0.1, 0.2, 0.3, 0.5, 0.7],
},
"lambda_l2": {
"values": [0.1, 0.2, 0.3, 0.5, 0.7],
},
"extra_trees": {
"values": [True, False],
},
"cov_function": {
"values": ["exponential", "gaussian"]
}
}
(using weights and biases' interface - in case this looks unusual to you)
I'm setting
data_train = gpb.Dataset(train_x, train_y, categorical_feature=categorical_features) # the columns of categorical_features are left as strings
gp_model = gpb.GPModel(gp_coords=coords_train, cov_function=cov_function) # cov_function is as specified in the hyperparameter search
bst = gpb.train(params=config, train_set=data_train,
gp_model=gp_model)
is there anything that is glaringly incorrect to you?
Apologies for my slow reply. This seems good. Maybe include also shallower trees:
"max_depth": {
"values": list(range(2, 98, 5)),
},
"num_leaves": {
"values": list(range(2, 103, 10)),
},
70 data points is very little. Not sure if I would use any machine learning model at all...