s3fs: translation of buffering=-1 to block_size causes read failures
kevinjacobs-delfi opened this issue · 2 comments
Versions:
- fsspec==2024.9.0
- s3fs==2024.9.0
- universal_pathlib==0.2.5
This code in core.py is causing problems:
if "buffering" in fsspec_kwargs:
fsspec_kwargs.setdefault("block_size", fsspec_kwargs.pop("buffering"))
Translation from buffering
-> block_size
results in read failures for files opened with buffering=-1
(the default for opening files) when passed to functions like json.load
for files on S3.
S3FileSystem.open does not specify what happens if a negative number of passed and I'll file a bug there too.
I don't have a public bucket to post a sample file, but I'll try to find one so that we can reproduce this behavior.
However, the obvious fix is something like this:
if "buffering" in fsspec_kwargs:
buffering = fsspec_kwargs.pop("buffering")
if buffering > 0:
fsspec_kwargs.setdefault("block_size", buffering)
Thanks for reporting. This is a UPath bug, not an s3fs bug imho.
Would you be willing to make a PR with a test and your suggested fix?
Also, may I ask why you're passing the default buffering=-1 down to your UPath.open(...)
calls? I'm curious how you discovered the bug.
Cheers,
Andreas
@ap-- I'll work on a PR today.
The context here is that I have a "smart file opener" that mimics the "open" API:
@contextmanager
def smartopen(
file: FileOrPathLike,
mode: OpenMode = 'r',
buffering: int = -1, # <-------------
encoding: str | None = None,
errors: str | None = None,
newline: str | None = None,
*,
compression: CompressionTypes | Literal['infer'] | None = 'infer',
compresslevel: int = 9,
hyphen: HyphenFileTypes | None = 'std',
) -> Iterator[IO[str] | IO[bytes]]:
"""Context manager for opening and closing compressed or uncompressed text or binary files.
So buffering=-1
is the default.