scikit-hep/awkward-0.x

AttributeError when trying to read a particular format of awkward array

HenryDayHall opened this issue · 5 comments

Reading and loading a particular shape of awkward array, that has been created by slicing a larger array, gives;
AttributeError: 'bytes' object has no attribute 'ctypes'

Here is an example that recreates the problem;

import os
import awkward


def test_split_unfinished():
    # clean any existing mess
    save_name = "test.awkd"
    try:
        os.remove(save_name)
    except FileNotFoundError:
        pass
    idxs = slice(1, None)
    # works ~~~~~~~~~~~~~~~~
    content = awkward.fromiter([[], []])
    awkward.save(save_name, content[idxs])
    found = awkward.load(save_name)
    print(found)
    os.remove(save_name)
    # works ~~~~~~~~~~~~~~~~
    content = awkward.fromiter([[], 0])
    awkward.save(save_name, content[idxs])
    found = awkward.load(save_name)
    print(found)
    os.remove(save_name)
    # works ~~~~~~~~~~~~~~~~
    content = awkward.fromiter([[0], []])
    awkward.save(save_name, content[idxs])
    found = awkward.load(save_name)
    print(found)
    os.remove(save_name)
    # fails ~~~~~~~~~~~~~~~~
    content = awkward.fromiter([[[]], []])
    awkward.save(save_name, content[idxs])
    found = awkward.load(save_name)
    print(found)
    os.remove(save_name)

I think this must be a bug, because I don't see anything wrong with what is being attempted?

This would be a bug:

AttributeError: 'bytes' object has no attribute 'ctypes'

In fact, it sounds like mistaking a bytestring (bytes object) for a NumPy array (which has a ctypes attribute, from which we can get a pointer to the underlying data). If I knew where that mistake was being made, I could wrap the bytestring with np.frombuffer to view it as a NumPy array.

However, I can't find where this is happening because when I run the same commands, I don't get any error. Try this again in the latest version; you might be seeing an old bug that has since been fixed. If it's still happening, give me the exact commands (for me to try to reproduce again) and the full stack trace (which can help me find the error even if I can't reproduce it).

Note that Awkward 0 is gradually being depreciated in favor of Awkward 1, so you might not want to do new work in Awkward 0. However, Awkward 1 doesn't have file-saving yet, which is what you want here. That's a good example of why it's not an immediate transition.

Ooh that's interesting. I think I am using the latest version of awkward (0.12.21 right?), my python version is not the latest however, it is 3.6.9, not sure if that matters? I will try to reproduce the behavior in a docker. In the mean time, here is the output I get;

[[]]
[0]
[[]]
Traceback (most recent call last):
  File "example.py", line 33, in <module>
    found = awkward.load(save_name)
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 700, in load
    out = f[""]
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 722, in __getitem__
    return deserialize(self._file, name=where + self.schemasuffix, awkwardlib=self.options["awkwardlib"], whitelist=self.options["whitelist"], cache=self.options["cache"])
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 575, in deserialize
    return unfill(schema["schema"])
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 517, in unfill
    args = [unfill(x) for x in schema.get("args", [])]
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 517, in <listcomp>
    args = [unfill(x) for x in schema.get("args", [])]
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 527, in unfill
    out = gen(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/awkward/array/jagged.py", line 105, in __init__
    if self.offsetsaliased(starts, stops):
  File "/usr/local/lib/python3.6/dist-packages/awkward/array/jagged.py", line 29, in offsetsaliased
    starts.ctypes.data == starts.base.ctypes.data and
AttributeError: 'bytes' object has no attribute 'ctypes'

From this stack trace, here's the bit that's supposed to identify bytes (anywhere) used as an array and convert it into an array.

https://github.com/scikit-hep/awkward-array/blob/d88527c69d3070aa49db2aa9e14d9f02adb73e19/awkward/array/base.py#L380-L394

So, that's weird.

Here is a docker that can reproduce the issue;
Dockerfile.zip
I'm sure you are rather better with these than I am, but on the off chance you haven't used it much there are instructions in the comments at the top of the file.

Thanks! I don't know what I must have been doing differently, but your explicit file revealed the error. Not only do the starts and stops have to go through _util_toarray (above), but their starts.base and stops.base do as well. It should be fixed in PR #251.