`boolean_convert` function converts `None` to `False`
erikyao opened this issue · 1 comments
MWE
from biothings.utils.dataload import boolean_convert
doc = {
"_id" : "CHEMBL3545192",
"chembl" : { "molecule_properties" : None }
}
doc = boolean_convert(doc, ["molecule_properties.ro3_pass"])
print(doc)
Output:
{
"_id": "CHEMBL3545192",
"chembl": { "molecule_properties": False }
}
Analysis
Involved function is boolean_convert(d, convert_keys=None, level=0)
- Module: biothings.api/biothings/utils/dataload.py
- Line: dataload.py#L113
This function calls to_boolean(val, true_str=None, false_str=None)
function in the same module (see dataload.py#L160), whose behaviors include:
if not isinstance(val, str):
return bool(val)
When val
is None
, the above code returns bool(None)
, which, out of my expectation, evaluates to False
.
Consequence
Python's evaluation of bool(None)
is discussed in this post, and its rationality is out of our topic.
My concern is that users of boolean_convert
function may be unaware of the False
values converted from None
field values. It could lead to further problems like "failure to index":
{
"index": {
"_index": "chem_20230209_tste2bgb",
"_id": "ZAJHBCDPAUKEPU-GIKXZWSFSA-N",
"status": 400,
"error": {
"type": "mapper_parsing_exception",
"reason": "object mapping for [chembl.molecule_properties] tried to parse field [molecule_properties] as object, but found a concrete value"
}
}
}
Workaround
Use dict_sweep
function in the same module (see dataload.py#L22) to remove None
fields BEFORE calling boolean_convert
.