biothings/biothings.api

`boolean_convert` function converts `None` to `False`

erikyao opened this issue · 1 comments

MWE

from biothings.utils.dataload import boolean_convertdoc = {
    "_id" : "CHEMBL3545192",
    "chembl" : { "molecule_properties" : None }
}
​
doc = boolean_convert(doc, ["molecule_properties.ro3_pass"])
​
print(doc)

Output:

{
   "_id": "CHEMBL3545192", 
   "chembl": { "molecule_properties": False }
}

Analysis

Involved function is boolean_convert(d, convert_keys=None, level=0)

This function calls to_boolean(val, true_str=None, false_str=None) function in the same module (see dataload.py#L160), whose behaviors include:

if not isinstance(val, str):
    return bool(val)

When val is None, the above code returns bool(None), which, out of my expectation, evaluates to False.

Consequence

Python's evaluation of bool(None) is discussed in this post, and its rationality is out of our topic.

My concern is that users of boolean_convert function may be unaware of the False values converted from None field values. It could lead to further problems like "failure to index":

{
  "index": {
    "_index": "chem_20230209_tste2bgb",
    "_id": "ZAJHBCDPAUKEPU-GIKXZWSFSA-N",
    "status": 400,
    "error": {
      "type": "mapper_parsing_exception",
      "reason": "object mapping for [chembl.molecule_properties] tried to parse field [molecule_properties] as object, but found a concrete value"
    }
  }
}

Workaround

Use dict_sweep function in the same module (see dataload.py#L22) to remove None fields BEFORE calling boolean_convert.

Not a bug; delegated to Issue #281 for additional comments