scikit-hep/uproot5

2D Array causes `uproot.dask` to try to pickle a thread lock

gordonwatts opened this issue · 5 comments

The following crashes with the dump below - failing to pickle a thread lock. If one comments out the indicated lines creating a client, then the code runs just fine. Inspecting the file, the culprit is a 2D branch (jet_EnergyPerSampling) - the 1D branches do not exhibit this behavior. The data file is on CERNBox (thanks to @alexander-held).

import uproot
import awkward as ak
import multiprocessing
from dask.distributed import Client, LocalCluster


# Comment starting here and the script will work
n_workers = multiprocessing.cpu_count()
print(n_workers)
n_workers = 2
cluster = LocalCluster(
    n_workers=n_workers, processes=False, threads_per_worker=1
)
client = Client(cluster)
# End comment out block

data = uproot.dask({
    "/data/gwatts/test.root": "atlas_xaod_tree"
})

total = ak.count(data.jet_pt)

print(total.compute())

The crash:

(venv) [bash][gwatts]:idap-200gbps-atlas > python servicex/test.py 
96
2024-04-05 23:43:58,648 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 2 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f8890386e20>
 0. <dask-awkward.lib.core.ArgsKwargsPackedFunction ob-37764e1a1a7ed10022ea822573c2855b
 1. count-finalize-4f7cc442fca58343712aedce024e98e7
>.
Traceback (most recent call last):
  File "/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
TypeError: cannot pickle '_thread.lock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
TypeError: cannot pickle '_thread.lock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
TypeError: cannot pickle '_thread.lock' object
Traceback (most recent call last):
  File "/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
TypeError: cannot pickle '_thread.lock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
TypeError: cannot pickle '_thread.lock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 353, in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
  File "/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 76, in pickle_dumps
    frames[0] = pickle.dumps(
  File "/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
TypeError: cannot pickle '_thread.lock' object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gwatts/code/iris-hep/idap-200gbps-atlas/servicex/test.py", line 21, in <module>
    print(total.compute())
  File "/venv/lib/python3.9/site-packages/dask/base.py", line 375, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/venv/lib/python3.9/site-packages/dask/base.py", line 661, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 379, in serialize
    raise TypeError(msg, str_x) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 2 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f8890386e20>\n 0. <dask-awkward.lib.core.ArgsKwargsPackedFunction ob-37764e1a1a7ed10022ea822573c2855b\n 1. count-finalize-4f7cc442fca58343712aedce024e98e7\n>')

One additional piece of information: I'm assuming that there's something very specific to this file / how this branch is written out, as I can read the information in jet_EnergyPerSampling just fine out of the original PHYSLITE with the same uproot.dask approach. The file linked in this issue instead comes out of a ServiceX tranform.

Another piece of information thanks to a debugging suggestion from @lgray: when using

uproot.dask(..., open_files=False)

the crash disappears and the output looks sensible again.

I ran into this independently, and because the issue came up when cloudpickling dask.base.unpack_collections.<locals>.repack, I thought it was a Dask-core issue. But if setting open_files=False in Uproot fixes it, that makes it much more likely Uproot.

I think that #1200 solves this problem. It also explains why we hadn't seen it until now: the AsObjects Interpretation applies to ElementLinks; the lock is for making sure that AwkwardForth code-generators running on different threads don't compete in generating code—it should be generated once by one thread and then used by all.

The protection of this lock in serialization exactly follows the procedure for protecting another lock in the Model superclass.

Awesome - thanks for getting this fixed!