DataArray.__eq__ with str dtype
Opened this issue · 5 comments
This looks good:
>> x = DataArray([1, 2])
>> x == 1
DataArray([ True, False], dtype=bool)
(None,)
This doesn't (should return DataArrays):
>> x = DataArray(['a', 'b'])
>> x == 'a'
array([ True, False], dtype=bool)
>> x == 1
False
This is what numpy does:
In [38]: x = np.array([1,2]) In [39]: x == 1 Out[39]: array([ True, False], dtype=bool) In [40]: x = np.array(['a', 'b']) In [41]: x == 'a' Out[41]: array([ True, False], dtype=bool) In [42]: x == 1 Out[42]: False
We may not be able to change the fact that the last case drops to a boolean False and doesn't return an array, but at least we should fix the second example so that we return a DatArray and not a plain array.
Hrm, this seems like a potential headache. If you dig through the Numpy source, comparisons between character arrays follow a special code path that is different from the other comparison logic (see compare_chararrays()
in numpy/core/src/multiarray/multiarraymodule.c
).
One potential fix would be to override __eq__
in DataArray; not sure if this is a good idea.
Here is another quirk:
>>> A = DataArray([[1, 2, 3], [4, 5, 6]])
>>> B = DataArray([[1, 2, 3], [4, 5, 6]], 'ab')
>>> C = DataArray([[1, 2, 3], [4, 5, 6]], 'cd')
>>> A == B
DataArray(array([[ True, True, True],
[ True, True, True]], dtype=bool),
('a', 'b'))
>>> A == C
DataArray(array([[ True, True, True],
[ True, True, True]], dtype=bool),
('c', 'd'))
>>> B == C
False
I'm not too crazy about the idea of overriding __eq__
, the more special methods we override, the trickier merging back with numpy will be. What I don't understand is, why do we end up witha base array on output for chararrays? Even if numpy takes a different codepath for char arrays, it should still honor the policy of using our finalizers to return our own class instead of the base one, no? I may be mistaken, but this sounds to me more like a numpy problem than a datarray one...