fabsig/GPBoost

I can't save the GPModel/GPBoostClassifier

Closed this issue · 9 comments

Error message

`~/Desktop/Codes/WinProbability/carrot_v1/lib/python3.9/site-packages/gpboost/basic.py in model_to_dict(self, include_response_data)
5854 model_dict["X"] = self._get_covariate_data()
5855 # Additional likelihood parameters (e.g., shape parameter for a gamma likelihood)
-> 5856 model_dict["params"]["init_aux_pars"] = self.get_aux_pars(format_pandas=False)
5857 # Note: for simplicity, this is put into 'init_aux_pars'. When loading the model, 'init_aux_pars' are correctly set
5858 model_dict["model_fitted"] = self.model_fitted

~/Desktop/Codes/WinProbability/carrot_v1/lib/python3.9/site-packages/gpboost/basic.py in get_aux_pars(self, format_pandas)
5126 else:
5127 aux_pars = None
-> 5128 return aux_pars
5129
5130 def summary(self):

UnboundLocalError: local variable 'aux_pars' referenced before assignment`

fabsig commented

Thanks for using GPBoost!

Can you please provide a reproducible example (for me it works)?

I have a Panel Dataset but my target is Binary do you think it's a good idea to train a GPModel and after that applied it in the GPBoosterClassifier ?

...and this doesn't matter what model or data which I'm using, and joblib neither pickle works.

gpboost = gpb.GPBoostClassifier(
    num_leaves = 200,
    max_depth = 10,
    learning_rate = 0.01,
    objective = 'binary'
)

gpboost.fit(
    X = train_data[boost_features],
    y = train_data['TARGET'],
    gp_model = gpbs.gp_model,
    train_gp_model_cov_pars = False,
    verbose = 2
) 
joblib.dump(gpboost,f'{MODEL_PATH}/gpmodel_booster.joblib')

pickle.dump(gpboost,open(f'{MODEL_PATH}/gpmodel_booster.joblib','wb'))

Thanks!

fabsig commented

Can you please provide a reproducible example including data (eg simulated) so that I can reproduce the error?

Yes, sure!

`import gpboost as gpb
import pandas as pd
import numpy as np

Load data

data = pd.read_csv("https://raw.githubusercontent.com/fabsig/Compare_ML_HighCardinality_Categorical_Variables/master/data/wages.csv.gz")
data = data.assign(t_sq = data['t']**2)# Add t^2

Partition into training and test data

n = data.shape[0]
np.random.seed(n)
permute_aux = np.random.permutation(n)
train_idx = permute_aux[0:int(0.8 * n)]
test_idx = permute_aux[int(0.8 * n):n]
data_train = data.iloc[train_idx]
data_test = data.iloc[test_idx]

Define fixed effects predictor variables

pred_vars = [col for col in data.columns if col not in ['ln_wage', 'idcode', 't', 't_sq']]`

fabsig commented

Sorry, but this code contains no calls to any GPBoost functions. Can you please provide a reproducible example including data (eg simulated) so that I can reproduce the error?

Yes, follow the code:

`
import gpboost as gpb
import pandas as pd
import numpy as np

Load data

data = pd.read_csv("https://raw.githubusercontent.com/fabsig/Compare_ML_HighCardinality_Categorical_Variables/master/data/wages.csv.gz")
data = data.assign(t_sq = data['t']**2)# Add t^2

Partition into training and test data

n = data.shape[0]
np.random.seed(n)
permute_aux = np.random.permutation(n)
train_idx = permute_aux[0:int(0.8 * n)]
test_idx = permute_aux[int(0.8 * n):n]
data_train = data.iloc[train_idx]
data_test = data.iloc[test_idx]

Define fixed effects predictor variables

pred_vars = [col for col in data.columns if col not in ['ln_wage', 'idcode', 't', 't_sq']]

gpboost = gpb.GPBoostClassifier(
num_leaves = 200,
max_depth = 10,
learning_rate = 0.01,
objective = 'binary'
)

gp_model = gpb.GPModel(group_data=data_train['idcode'], likelihood='gaussian')
data_bst = gpb.Dataset(data=data_train[pred_vars], label=data_train['ln_wage'])

gpboost.fit(
X = data_train[pred_vars],
y = data_train['ln_wage'],
gp_model = gp_model,
train_gp_model_cov_pars = False
)

ERROR HERE

joblib.dump(gpboost,f'{MODEL_PATH}/gpmodel_booster.joblib')

pickle.dump(gpboost,open(f'{MODEL_PATH}/gpmodel_booster.joblib','wb'))
`

fabsig commented

I am getting the following error when running your code: ValueError: Unknown label type: 'continuous'.

Your are trying to give a continuous label variable to a binary classifier.

Please, could you try with this?

gpboost = gpb.GPBoostRegressor( num_leaves = 200, max_depth = 10, learning_rate = 0.01, )

I fixed a bug when saving models (related to aux_pars). Your error should no longer appear (with version 1.2.7 or later).

FWIW: on my machine, no error occurred also with earlier versions of GPBoost when I run your code, it runs (and did run) all fine. In any case, I would not save models using pickle or joblib (not sure, if this works correctly), but rather use GPBoost's internal saving option: see here for an example.

Thanks alot for reporting this issue!