awslabs/privacy-preserving-xgboost-inference

BoostParser Error if DataFrame Column Name has Space

Wei-1 opened this issue · 1 comments

Wei-1 commented

In the boostparser:

Currently, the parser will throw an exception when there is space in the column name.

For example, in the breast_cancer dataset, we will get something like:

current_node = "0:[worst radius<16.7950001] yes=1,no=2,missing=1"
leaf_strs = re.findall(r"[\w.-]+", current_node)
print(leaf_strs)
['0', 'worst', 'radius', '16.7950001', 'yes', '1', 'no', '2', 'missing', '1']

Which will not be able to pass the validator if len(leaf_strs) != 9:.

To get this fixed, we might need to rewrite the string parsing logic or add some placeholder for the space in the column name.

Or at least we should add a warning to help the user remove the spaces when throwing the Exception.

Fixed by #11