Speedup with XGboost classifier runtime error
adendek opened this issue · 3 comments
adendek commented
I have tried to use speedup.LocukpClassifier with XGboost as a base_estimator but I failed. This may be a bug in LockupClassifier implementation.
I executed following python code:
train_X, test_X, train_Y, test_Y = train_test_split(new_features, target, random_state=42,train_size=0.5 )
base_classifier = xgb.XGBClassifier(n_estimators=400, learning_rate=0.07 ,scale_pos_weight=ratio_ghost_to_good)
classifier = LookupClassifier(base_estimator=base_classifier, keep_trained_estimator=False)
classifier.fit(train_X, train_Y)
And obtained following error code
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-afeadcdc082e> in <module>()
3 base_classifier = xgb.XGBClassifier(n_estimators=400, learning_rate=0.07 ,scale_pos_weight=ratio_ghost_to_good)
4 classifier = LookupClassifier(base_estimator=base_classifier, keep_trained_estimator=False)
----> 5 classifier.fit(train_X, train_Y)
/afs/cern.ch/user/a/adendek/.local/lib/python2.7/site-packages/hep_ml/speedup.pyc in fit(self, X, y, sample_weight)
91 all_lookup_indices = numpy.arange(int(n_parameter_combinations))
92 all_combinations = self.convert_lookup_index_to_bins(all_lookup_indices)
---> 93 self._lookup_table = trained_estimator.predict_proba(all_combinations)
94
95 if self.keep_trained_estimator:
/afs/cern.ch/user/a/adendek/.local/lib/python2.7/site-packages/xgboost/sklearn.pyc in predict_proba(self, data, output_margin, ntree_limit)
475 class_probs = self.booster().predict(test_dmatrix,
476 output_margin=output_margin,
--> 477 ntree_limit=ntree_limit)
478 if self.objective == "multi:softprob":
479 return class_probs
/afs/cern.ch/user/a/adendek/.local/lib/python2.7/site-packages/xgboost/core.pyc in predict(self, data, output_margin, ntree_limit, pred_leaf)
937 option_mask |= 0x02
938
--> 939 self._validate_features(data)
940
941 length = ctypes.c_ulong()
/afs/cern.ch/user/a/adendek/.local/lib/python2.7/site-packages/xgboost/core.pyc in _validate_features(self, data)
1177
1178 raise ValueError(msg.format(self.feature_names,
-> 1179 data.feature_names))
1180
1181 def get_split_value_histogram(self, feature, fmap='', bins=None, as_pandas=True):
ValueError: feature_names mismatch: [u'seed_chi2PerDoF', u'seed_p', u'seed_pt', u'seed_nLHCbIDs', u'seed_nbIT', u'seed_nLayers', u'seed_x', u'seed_y', u'seed_tx', u'seed_ty', u'abs_seed_x', u'abs_seed_y', u'abs_seed_tx', u'abs_seed_ty', u'seed_r', u'pseudo_rapidity'] ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15']
expected seed_nbIT, abs_seed_y, abs_seed_x, seed_tx, seed_pt, seed_nLayers, seed_x, seed_y, seed_ty, pseudo_rapidity, seed_p, seed_r, abs_seed_tx, abs_seed_ty, seed_nLHCbIDs, seed_chi2PerDoF in input data
training data did not have the following fields: f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f12, f13, f10, f11, f14, f15
arogozhnikov commented
@adendek thanks for reporting, I forgot about this scenario
arogozhnikov commented
I've fixed this in the develop branch, try it now:
pip uninstall hep_ml
pip install https://github.com/arogozhnikov/hep_ml/archive/develop.zip
adendek commented
Great! It works!