scikit-hep/uproot5

File-opening functions (e.g. `uproot.open`) should complain about unrecognized arguments

jpivarski opened this issue · 0 comments

The processing chain for file-opening ends on the Source constructors, so that means that every Source constructor should raise an error if there are any unused arguments. Maybe each step of the way should pop arguments as they use them (fragile), or each Source should have a set of universal arguments and particular arguments to check (may be redundant).

This might break people's workflows if they want to switch between different Source classes without changing their arguments.

Maybe a weak version of this would have the Source constructor complain if an argument name would not be recognized by any Source class. Arguments that are useless for this Source but meaningful for another would be accepted. I guess that's the best way to do this.

This is a complete set of options (for the union of all Source classes):

open.defaults = {
"handler": None,
"timeout": 30,
"max_num_elements": None,
"num_workers": 1,
"use_threads": sys.platform != "emscripten",
"num_fallback_workers": 10,
"begin_chunk_size": 403, # the smallest a ROOT file can be
"minimal_ttree_metadata": True,
"http_max_header_bytes": 21784,
}

and the regular arguments would also have to be allowed:

def open(
path: str | Path | IO | dict[str | Path | IO, str],
*,
object_cache=100,
array_cache="100 MB",
custom_classes=None,
decompression_executor=None,
interpretation_executor=None,
**options,
):

and

def concatenate(
files,
expressions=None,
cut=None,
*,
filter_name=no_filter,
filter_typename=no_filter,
filter_branch=no_filter,
aliases=None,
language=uproot.language.python.python_language,
decompression_executor=None,
interpretation_executor=None,
library="ak",
ak_add_doc=False,
how=None,
custom_classes=None,
allow_missing=False,
**options,
):

def iterate(
files,
expressions=None,
cut=None,
*,
filter_name=no_filter,
filter_typename=no_filter,
filter_branch=no_filter,
aliases=None,
language=uproot.language.python.python_language,
step_size="100 MB",
decompression_executor=None,
interpretation_executor=None,
library="ak",
ak_add_doc=False,
how=None,
report=False,
custom_classes=None,
allow_missing=False,
**options,
):

and

def dask(
files,
*,
filter_name=no_filter,
filter_typename=no_filter,
filter_branch=no_filter,
recursive=True,
full_paths=False,
step_size=unset,
steps_per_file=unset,
library="ak",
ak_add_doc=False,
custom_classes=None,
allow_missing=False,
open_files=True,
form_mapping=None,
allow_read_errors_with_report=False,
known_base_form=None,
decompression_executor=None,
interpretation_executor=None,
**options,
):

(Those are all of the file-opening functions. I don't think I've left any out.)