Send boolean values as boolean rather than integers
Closed this issue ยท 7 comments
Hi again ๐
The new enum metadata works great for boolean datasets/attributes ๐
However, values are returned as Int8Array
for arrays or integers for scalars. This means conversion must be done by the consumer which brings issues for nD datasets/attributes.
Do you think it would be possible to return arrays of booleans (or simply booleans for scalars) instead ?
What h5py does in handling boolean datasets (and complex datasets for that matter) is a little "magic", given that boolean datatypes are not implemented directly in the HDF5 specification. I would prefer more explicit mechanisms, but if there is overwhelming benefit it could of course be done.
Continuing discussion from silx-kit/h5web#1112 (comment)
There are performance (and other) issues with using nested arrays instead of typed arrays. I still don't understand the scope of the request - is it preferred in your use case that all dataset and attribute values be converted to nested arrays, or just attribute values, or just "boolean" attribute values?
Ideally, I would like to have boolean values (regardless if it is from a dataset or an attribute) to be return as nested arrays of booleans.
Examples:
Original dataset/attribute in h5py (Py) | Ideal returned value (JS) | Actual returned value (JS) |
---|---|---|
True |
true |
1 |
[True, False] |
[true, false] |
Int8Array(2) [ 1, 0 ] |
[ [True, False], [True, False] ] |
[ [true, false], [true, false] ] |
Int8Array(4) [1, 0, 1, 0] |
I agree that it is a bit of "magic" as you said which is why I wanted to discuss it first with you. I also think that performance issues are mitigated as nested arrays would only be used for booleans and I don't expect huge boolean datasets/attributes.
What is shown for h5py in the table above is really the result of two levels of special handling... first h5py follows a convention that any enum with members {"TRUE", "FALSE"} should trigger the creation of a numpy array with dtype 'bool' (it is still passing a 1D buffer of bytes into that numpy array on construction), and then numpy has special __repr__
and tolist
serialization methods for bool arrays. The result of tolist
is what is shown above.
I would consider adding a tolist
method to datasets and attributes, since we don't have a nice intermediate container library in javascript that corresponds to the role of numpy.ndarray with h5py.
I would also agree that we could add special handling for the case of enum {"TRUE", "FALSE"}, where we could say that this is "in agreement with the h5py convention" for creating boolean arrays in HDF5, and return (flat, 1D) arrays of JS boolean values from .value
(and possibly nested values from .tolist()
, as above)
I don't think it would make sense to return nested arrays from Dataset.value
or Attribute.value
for just this one particular instance of the enum datatype, when none of the other datatypes (float, int, enum...) would be handled this way.
EDIT: I just realized that the enum has to be {"FALSE", "TRUE"} instead of the other way around, but that doesn't affect the above discussion ๐
see #26
Thanks #26 is already a great improvement !
Given your concerns, having a tolist
method would indeed be nice. This way, the default behaviour of typed arrays would not change and I could opt-in returning nested arrays by calling this method
an implementation of to_array
is in #27 if you want to comment on it. I think it roughly does what tolist
does in numpy.
v0.4.4 fits the bill for me !