AttributeError: no column named 'reshape' for ChunkedArray with jagged content
raymondEhlers opened this issue · 4 comments
When attempting to process a ChunkedArray containing jagged values loaded via uproot.lazyarrays(...)
, I receive: AttributeError: no column named 'reshape'
. The full traceback can be seen in the following example:
In [14]: arrays["data_z"]
Out[14]: <ChunkedArray [[0.22628734 0.094208576 0.11197069 0.23886986] [0.2817931 0.02485017 0.31829283 ... 0.37544665 0.39175743 0.063571155] [0.084453866 0.022292202] ... [0.0815998 0.48151806 0.27003774 0.35759225 0.47045997 0.29828948] [0.049144566 0.26100245 0.37040257 ... 0.11787954 0.3755424 0.45226774] [0.4637775 0.3404903 0.3402615 ... 0.20760414 0.16756321 0.47770718]] at 0x00010f6814d0>
In [15]: np.sin(arrays["data_z"]) / arrays["data_z"]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
test.py in <module>
----> 1 np.sin(arrays["data_z"]) / arrays["data_z"]
.venv/lib/python3.7/site-packages/numpy/lib/mixins.py in func(self, other)
23 if _disables_array_ufunc(other):
24 return NotImplemented
---> 25 return ufunc(self, other)
26 func.__name__ = '__{}__'.format(name)
27 return func
.venv/lib/python3.7/site-packages/awkward/array/chunked.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
560 types = {}
561 for batch in batches:
--> 562 result = getattr(ufunc, method)(*batch, **kwargs)
563
564 if isinstance(result, tuple):
.venv/lib/python3.7/site-packages/awkward/array/chunked.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
560 types = {}
561 for batch in batches:
--> 562 result = getattr(ufunc, method)(*batch, **kwargs)
563
564 if isinstance(result, tuple):
.venv/lib/python3.7/site-packages/awkward/array/jagged.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
1025 return content
1026
-> 1027 content = recurse(data)
1028
1029 inputs[i] = self.JaggedArray(starts, stops, content)
.venv/lib/python3.7/site-packages/awkward/array/jagged.py in recurse(x)
1014 content = self.numpy.full(len(parents), x, dtype=x.dtype)
1015 else:
-> 1016 content = x.reshape(-1)[parents]
1017 return content
1018
.venv/lib/python3.7/site-packages/awkward/array/base.py in __getattr__(self, where)
254 raise AttributeError("while trying to get column {0}, an exception occurred:\n{1}: {2}".format(repr(where), type(err), str(err)))
255 else:
--> 256 raise AttributeError("no column named {0}".format(repr(where)))
257
258 def __dir__(self):
AttributeError: no column named 'reshape'
I can cause it with some operations (such as the above), but not in every case (simple division works, but dividing three terms fails). As the traceback hints, it seems to be related to processing chunks - if I load the data using arrays(...)
, the operation works fine. Perhaps I've missed a detail, but I'm at a loss why this doesn't work. Any suggestions would be greatly appreciated!
You're seeing something real, but I'm going to recommend sticking with non-lazy arrays. The reason it's failing is because some jagged operations assumed that the contents are NumPy-like (with a reshape
method), but chunked arrays fail to satisfy this contract in some ways. (Lazy arrays are "chunked virtual": the chunking is for files/baskets and virtual for delayed reading.) Some of these bugs have already been patched, but it's a deep rabbit hole.
It was exactly this inconsistency that motivated a complete rewrite of Awkward Array, which is nearly finished. I'm scheduled to port Uproot to the new Awkward Array in April. The quickest way to fix all of these issues is probably to focus on the update.
Lazy arrays are a more implicit way to do what uproot.iterate
does. Are you able to get your analysis work done with that?
Thanks for your super rapid and detailed response! (as always!)
To back up a bit, I ran into this issue because I was trying to work around my other issue related to doubly jagged arrays requiring a python loop. My goal was to calculate them once, and then store them in HDF5. The persistvirtual
functionality of lazyarrays
seemed ideal, so I was trying to use that to avoid reinventing the wheel.
Since this won't work until uproot 4 / awkward 1, I think I can use iterate for my analysis - I'll just work around it by calculating and converting everything to HDF5 straightaway.
Thanks for all your efforts! Even with hitting a few edge cases, these packages are tremendously useful! From my perspective, this can be closed, but I'll leave it up to you to decide since it won't be resolved until uproot 4.
Thanks for letting me know! It will be solved in a way that won't necessarily be recognizable: jagged array implementations simply won't be using functions like reshape
, so I might want to close it now so that I don't forget or have to reevaluate whether it's relevant. On the other hand, other users might encounter this bug and I want to be l make it easier for them to find this recommendation, so for that reason, I'll leave it open until Uproot 4.
I've decided to close it because I'm trying to figure out what, exactly, is outstanding.