scikit-hep/uproot5

xrootd read fails for large arrays

ponyisi opened this issue · 6 comments

I am running into a problem with arrays() when reading from xrootd:

>>> import uproot
>>> 
>>> t=uproot.open({'root://lcg-lrz-rootd.grid.lrz.de:1094/pnfs/lrz-muenchen.de/data/atlas/dq2/atlasscratchdisk/rucio/user/mtost/f7/4c/user.mtost.38576164._000004.output_newp.root': 'reco'})
>>> t.arrays(filter_name='/mu_.*/')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/behaviors/TBranch.py", line 823, in arrays
    _ranges_or_baskets_to_arrays(
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/behaviors/TBranch.py", line 3105, in _ranges_or_baskets_to_arrays
    uproot.source.futures.delayed_raise(*obj)
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/source/futures.py", line 38, in delayed_raise
    raise exception_value.with_traceback(traceback)
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/behaviors/TBranch.py", line 3026, in chunk_to_basket
    basket = uproot.models.TBasket.Model_TBasket.read(
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/model.py", line 854, in read
    self.read_members(chunk, cursor, context, file)
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/models/TBasket.py", line 227, in read_members
    ) = cursor.fields(chunk, _tbasket_format1, context)
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/source/cursor.py", line 201, in fields
    return format.unpack(chunk.get(start, stop, self, context))
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/source/chunk.py", line 446, in get
    self.wait(insist=stop)
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/source/chunk.py", line 388, in wait
    self._raw_data = numpy.frombuffer(self._future.result(), dtype=self._dtype)
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/uproot/source/fsspec.py", line 28, in result
    return self._parent.result(timeout=timeout)[self._part_index]
  File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/fsspec_xrootd/xrootd.py", line 641, in _cat_ranges
    results = await _run_coros_in_chunks(coros, batch_size=batch_size, nofiles=True)
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/fsspec/asyn.py", line 268, in _run_coros_in_chunks
    result, k = await done.pop()
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/fsspec/asyn.py", line 245, in _run_coro
    return await asyncio.wait_for(coro, timeout=timeout), i
  File "/usr/lib64/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/tmp/ponyisi/venv/lib64/python3.9/site-packages/fsspec_xrootd/xrootd.py", line 601, in _cat_vector_read
    raise OSError(f"File did not vector_read properly: {status.message}")
OSError: File did not vector_read properly: [ERROR] Server responded with an error: [3002] Single readv transfer is too large.

I've put the file at https://cernbox.cern.ch/s/aLg8sfkoTvgqB9F as well, but obviously this needs to be run over xrootd to reproduce.

Versions: uproot 5.3.7, fsspec 2024.3.1, fsspec_xrootd 0.3.0, xrootd 5.6.9.

This is using the fsspec-xrootd backend. You can try the old (pre-fsspec) backend by passing handler=uproot.XRootDSource to uproot.open.

The next thing to try is cutting down on the size of the request by limiting the number of entries with entry_start and entry_stop or limiting the number of branches with filter_branches in TTree.arrays.

This is an XRootD error. A different server might even have a different cut-off for what it considers too big of a request.

I'll give that a try (I do know that asking for one branch only stops the error from occurring).

Not really sure I agree that it's an xrootd error, though: as a user I am asking for data to materialize in my program, and the details of exactly how the network access is being performed or how big chunk sizes are should not be my concern. I would expect the underlying libraries to manage the requests appropriately; I am not sure how e.g. ServiceX can reliably work if we have to implement the recovery logic on our side.

If this were to be done in Uproot, we'd have to dynamically adjust the request size and resubmit when we see such an error, and have some reasonable give-up policy after a specified number of retries to keep it from spiraling out of control. And if the OSError came from a local file, rather than XRootD, it's not retryable (if an open local file suddenly can't be read, someone must have unplugged the disk or something).

Possibly the smallest sections that would need to be retried are:

in HasBranches.arrays:

arrays, expression_context, branchid_interpretation = _regularize_expressions(
self,
expressions,
cut,
filter_name,
filter_typename,
filter_branch,
keys,
aliases,
language,
get_from_cache,
)
ranges_or_baskets = []
checked = set()
for _, context in expression_context:
for branch in context["branches"]:
if branch.cache_key not in checked:
checked.add(branch.cache_key)
for (
basket_num,
range_or_basket,
) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
ranges_or_baskets.append((branch, basket_num, range_or_basket))
interp_options = {"ak_add_doc": ak_add_doc}
_ranges_or_baskets_to_arrays(
self,
ranges_or_baskets,
branchid_interpretation,
entry_start,
entry_stop,
decompression_executor,
interpretation_executor,
library,
arrays,
False,
interp_options,
)

in HasBranches.iterate:

ranges_or_baskets = []
checked = set()
for _, context in expression_context:
for branch in context["branches"]:
if branch.cache_key not in checked:
checked.add(branch.cache_key)
for (
basket_num,
range_or_basket,
) in branch.entries_to_ranges_or_baskets(
sub_entry_start, sub_entry_stop
):
previous_basket = previous_baskets.get(
(branch.cache_key, basket_num)
)
if previous_basket is None:
ranges_or_baskets.append(
(branch, basket_num, range_or_basket)
)
else:
ranges_or_baskets.append(
(branch, basket_num, previous_basket)
)
arrays = {}
interp_options = {"ak_add_doc": ak_add_doc}
_ranges_or_baskets_to_arrays(
self,
ranges_or_baskets,
branchid_interpretation,
sub_entry_start,
sub_entry_stop,
decompression_executor,
interpretation_executor,
library,
arrays,
True,
interp_options,
)

in Branch.array:

arrays = {}
expression_context = []
branchid_interpretation = {}
_regularize_branchname(
self,
self.name,
self,
interpretation,
get_from_cache,
arrays,
expression_context,
branchid_interpretation,
True,
False,
)
ranges_or_baskets = []
checked = set()
for _, context in expression_context:
for branch in context["branches"]:
if branch.cache_key not in checked and not isinstance(
branchid_interpretation[branch.cache_key],
uproot.interpretation.grouped.AsGrouped,
):
checked.add(branch.cache_key)
for (
basket_num,
range_or_basket,
) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
ranges_or_baskets.append((branch, basket_num, range_or_basket))
interp_options = {"ak_add_doc": ak_add_doc}
_ranges_or_baskets_to_arrays(
self,
ranges_or_baskets,
branchid_interpretation,
entry_start,
entry_stop,
decompression_executor,
interpretation_executor,
library,
arrays,
False,
interp_options,
)

Maybe it's possible to retry only the _ranges_or_baskets_to_arrays call, but I'm not sure if there would be a counting problem if the arrays dict and ranges_or_baskets list are not reinitialized after a partial failure. (These are notes for implementation.) If the number of items in the arrays dict is wrong, the code will hang in this loop, rather than raise an error!

There would also need to be a way to change the granularity of the XRootD request while still requesting all the data the user wants. The Source.chunks method doesn't have a way to express that, but maybe the coalesce algorithm does? Unfortunately, coalesce arguments are specified per-FSSpecSource object, but maybe each retry could be a new, more finely granular FSSpecSource? (We don't know if another thread is using the same FSSpecSource object; we can't change it in place.)

Ah @jpivarski is correct that it does seem that this is an issue with xrootd (related to https://its.cern.ch/jira/browse/ROOT-6639, https://root-forum.cern.ch/t/error-when-streaming-rootfiles-via-xrootd/37783) - it seems dCache servers lie about how big the transfers are that they support, and clients take them at their word for it at their peril. There's a workaround in ROOT, and doing something similar in Python seems to fix the problem. (In the language of the uproot.XRootDSource, it's essentially setting max_num_elements to something like 2048, but only if the server returns very something like 0xAAAAAA8 for this value.)

Thanks for the update! I've split-out the notes on how a retry mechanism could be implemented to #1219, and I'll close this now.

Incidentally uproot 5.3.8 seems to fix this partially (I believe due to the read coalescing instead of vector reads).