`to_dask()` not lazy when `simplecache::` in urlpath
aaronspring opened this issue · 1 comments
when loading to_dask
with caching as in pangeo-data/pangeo-datastore#113, fsspec.open_local
first loads the whole dataset and then opens the data in xarray
, still with chunks but after having spend the time on downloading.
is there a way to circumvent this in intake-xarray
or is this a consequence from fsspec
caching that cannot be changed for intake-xarray
?
it would be great to just do to_dask()
without spending the time to download and only cache when xarray
runs compute
.
Whilst this may be possible, it would be tricky. Dask wants to open the file to assess the chunking; it could be done on the original file, but only cache it when actually loading, in theory. There is a block-wise cacher in fsspec, which only downloads the parts of a file that are accessed, as they are accessed, but that only works with a library expecting to work with python file-like objects (i.e., there's a reason to call open_local: the library wants a real local file). You could do something with FUSE, where the file looks real to the OS, but uses block-wise chunking internally - this kind of thing I'm pretty sure has never been tried.