DataArray.eq with str dtype

Question

DataArray.eq with str dtype

Opened this issue 14 years ago · 5 comments

This looks good:

>> x = DataArray([1, 2])
>> x == 1
   DataArray([ True, False], dtype=bool)
   (None,)

This doesn't (should return DataArrays):

>> x = DataArray(['a', 'b'])
>> x == 'a'
   array([ True, False], dtype=bool)       
>> x == 1
   False

Answer 1 · 2010-07-29T01:26:52.000Z

This is what numpy does:

In [38]: x = np.array([1,2])

In [39]: x  == 1
Out[39]: array([ True, False], dtype=bool)

In [40]: x = np.array(['a', 'b'])

In [41]: x == 'a'
Out[41]: array([ True, False], dtype=bool)

In [42]: x == 1
Out[42]: False

Answer 2 · 2010-07-29T01:30:25.000Z

We may not be able to change the fact that the last case drops to a boolean False and doesn't return an array, but at least we should fix the second example so that we return a DatArray and not a plain array.

Answer 3 · 2011-06-02T11:29:18.000Z

Hrm, this seems like a potential headache. If you dig through the Numpy source, comparisons between character arrays follow a special code path that is different from the other comparison logic (see compare_chararrays() in numpy/core/src/multiarray/multiarraymodule.c).

One potential fix would be to override __eq__ in DataArray; not sure if this is a good idea.

Answer 4 · 2011-06-02T11:40:06.000Z

Here is another quirk:

>>> A = DataArray([[1, 2, 3], [4, 5, 6]])
>>> B = DataArray([[1, 2, 3], [4, 5, 6]], 'ab')
>>> C = DataArray([[1, 2, 3], [4, 5, 6]], 'cd')
>>> A == B
DataArray(array([[ True,  True,  True],
       [ True,  True,  True]], dtype=bool),
('a', 'b'))
>>> A == C
DataArray(array([[ True,  True,  True],
       [ True,  True,  True]], dtype=bool),
('c', 'd'))
>>> B == C
False

Answer 5 · 2011-06-03T06:26:41.000Z

I'm not too crazy about the idea of overriding __eq__, the more special methods we override, the trickier merging back with numpy will be. What I don't understand is, why do we end up witha base array on output for chararrays? Even if numpy takes a different codepath for char arrays, it should still honor the policy of using our finalizers to return our own class instead of the base one, no? I may be mistaken, but this sounds to me more like a numpy problem than a datarray one...