Example when using Dask Gateway cluster (remote workers)
Closed this issue · 13 comments
Is your feature request related to a problem? Please describe.
I'm not sure how to write the image files to scratch when using a remote dask cluster (e.g. a Dask Gateway cluster, where the workers are running on kubernetes and don't see the local filesystem)
https://nbviewer.org/gist/rsignell/9cd38657ca9ad20e55799bf41cb80de6
Is there a way to use streamjoy already with this type of Dask cluster?
If not, what would be the best way?
Can the scratch images be stored in memory or on object storage?
Ooh yeah that's a good point; potentially fsspec? I don't have a cluster to try this though :(
Are there public hubs I could try experimenting with?
A potential work around is using in_memory=True
import xarray as xr
from streamjoy import stream
ds = xr.tutorial.open_dataset("air_temperature").isel(time=slice(0, 100))
airt_stream = stream(ds, cmap="RdBu_r", var='air', dim='time', max_frames=10, in_memory=True)
airt_stream.write('foo.mp4')
Eventually, I think I could support remote_dir
and if set, read from there.
Andrew,
The in_memory=True
worked great, and I guess would always be the preferred solution if you can fit all the images in RAM, right? (I can't believe how blazing fast that is -- I could barely open the Dask dashboard in time!)
For cases where the frames exceeded RAM, it would indeed be nice to write/read from object storage.
If you would like to test, I added you as a user on the ESIP Nebari deployment, so you should be able to run this using your github credentials: https://nebari.esipfed.org/hub/user-redirect/lab/tree/shared/users/rsignell/notebooks/Genoa_demo/streamjoy.ipynb
I added a few cells at the end to show how to write to the ESIP S3 bucket.
you can fit all the images in RAM, right?
No, it's only the unsaved images; once they are saved, the image in memory should get discarded.
Glad to hear it works!
Okay I was able to access nepari, but not the S3 bucket due to PermissionError: Access Denied
and also
---> 52 os.environ['aws_access_key_id'.upper()]=cp[profile]['aws_access_key_id']
53 os.environ['aws_secret_access_key'.upper()]=cp[profile]['aws_secret_access_key']
54 os.environ['aws_profile'.upper()]=profile
I tried to apply a patch to the best of my knowledge without testing it; I'd appreciate it if you could test this: #18
if you can fit all the images in RAM
Found a bug in MP4, I realized I forgot to del
image after writing it out; fixed here: #20
Thanks for helping me test!
Grrr.... I forgot to give you the welcome instructions that include the step to copy the AWS credentials into your ~/.aws folder. I just did it as admin, so the streamjoy notebook should work in your account now...
Okay great! I was able to get it to run partially, but I didn't have enough time to investigate
ClientError: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.
I think it's just prefixing the files with like s3://esip-hub/
or something like that. I'll play around with it more later.
By the way, I was able to install the streamjoy branch with pip install git https://github.com/ahuang11/streamjoy/tree/fsspec_fs
v0.0.4 is now released. It should support fsspec now; docs here:
https://ahuang11.github.io/streamjoy/best_practices/#use-fsspec-to-readwrite-intermediate-files-on-a-remote-filesystem
import xarray as xr
import fsspec
from streamjoy import stream
ds = xr.tutorial.open_dataset("air_temperature").isel(time=slice(0, 1000))
fs = fsspec.filesystem('s3', anon=False)
airt_stream = stream(ds, fsspec_fs=fs, scratch_dir="esip-qhub/ahuang/streamjoy_scratch", max_frames=-1)
airt_stream.write("test.mp4")
Encountering:
TypeError: resolve_uri() got an unexpected keyword argument 'fsspec_fs'
I think the cluster's workers need streamjoy>=0.0.4
too
@ahuang11 I couldn't get this to work with Dask Gateway. I see the frames are written, but getting this error:
https://nebari.esipfed.org/hub/user-redirect/lab/tree/shared/users/rsignell/notebooks/Genoa_demo/streamjoy.ipynb
v0.0.5 is now out! It should work when the workers also have the latest I think.