Blosc/bcolz

np.datetime64 not handled properly in fetchwhere

apalepu23 opened this issue · 2 comments

Hi,

I'm trying to use the fetchwhere function to filter a bcolz table I have, but am unable to do so when filtering a column of np.datetime64 type since it generates the following error: IndexError: invalid index to scalar variable.

Here's a small snippet of code that generates this error:

import numpy as np
import bcolz

N = int(5)
ct = bcolz.fromiter(((i,i*i) for i in range(N)), dtype="i4,f8", count=N)
new_col = [np.datetime64('2018-03-01'), np.datetime64('2018-03-02'), np.datetime64('2018-03-03'), np.datetime64('2018-03-04'), np.datetime64('2018-03-05')]
ct.addcol(new_col)
threshold = np.datetime64('2018-03-03')
ct.fetchwhere('(f2 > threshold)', user_dict={'threshold': threshold})

I've attached a simple ipython notebook where I run this code and successfully filter columns with an int parameter, but fail to do so using a np.datetime64 parameter. I believe the issue lies in the _eval_blocks function in chunked_eval.py. Here, we do the following check:

if hasattr(var, "__getitem__"):
    vars_[name] = var[:]

Unfortunately, np.datetime64 objects satisfy that condition and have a getitem attr, despite operating like scalar values. Could we perhaps use a different check than if hasattr(var, "getitem")? Maybe use something like np.isscalar()?

Thanks!
bcolz_test.pdf

Sure. Could you test the different approaches? In case one of them do what you want and passes the test suite, please file a pull request; I'll be glad to include it for the forthcoming release.

Hi Francesc,

Thanks so much for the speedy response - I just submitted a pull request (#377) from my forked repository. It passes the test suite and accomplishes what I want, but feel free to do any other tests as well if you'd like. bcolz is an awesome package, so I'm happy to contribute in this regard!