xa.load_all not loading files past the first
Closed this issue · 9 comments
xa.load_all(directory='gs://ldeo-glaciology/GL_apres_2022', remote_load = True, file_numbers_to_process = [1], bursts_to_process=[0,1]
^This works
xa.load_all(directory='gs://ldeo-glaciology/GL_apres_2022', remote_load = True, file_numbers_to_process = [1,1], bursts_to_process=[0,1]
As does this
But:
xa.load_all(directory='gs://ldeo-glaciology/GL_apres_2022', remote_load = True, file_numbers_to_process = [1,2], bursts_to_process=[0,1]
and
xa.load_all(directory='gs://ldeo-glaciology/GL_apres_2022', remote_load = True, file_numbers_to_process = [2], bursts_to_process=[0,1]
Do not. Seems like this is the case with all the other files as well. I did make some changes, but the changes (adding uncertainty) seem to be fine when loading just the one.
I will add my changes as a pull request shortly and also discuss them with @jkingslake - there's some structural changes and some questions relating to how uncertainty comes into play. I think we are close to getting a series of vertical velocity measurements
The NBs could be out of date in terms of which directories they are looking in (I moved around the ApRES DAT files in the bucket towards the end of all that work).
The code snippet here has the directory to load from.
Can you try the same tests loading from there?
This is loading in already saved xarrays as zarrs, if im interpreting this correctly? Does this mean we are sticking to the existing xarray structure, meaning that I shouldnt make any changes to functions like _burst_to_xarray
in the xarray class? This is fine, I can adapt to using this structure for things. I had initially been writing my update to include uncertainty in the xarray such that it is done from the stage where we're still loading in the dat files.
oh yeah, you are right!
What's the traceback for the ones which dont work?
I think I've reverted back to the existing scripts and yeah it still hangs.
filepaths = xa.list_files(directory='gs://ldeo-glaciology/GL_apres_2022', remote_load = True)
This shows that there are 386 files available
xa = ApRESDefs.xapres(loglevel='debug', max_range=1400) xa.load_all(directory='gs://ldeo-glaciology/GL_apres_2022', remote_load = True, file_numbers_to_process = [1,2], bursts_to_process=[0,1] )
This is the script that gets stuck, more specifically at ApRESDefs.py @function load_all line 230 - Load dat file ldeo-glaciology/GL_apres_2022/A101/CardA/DIR2022-05-26-1536/DATA2022-05-27-1506.DAT
There's no traceback error, it just doesn't continue past here. Doing some digging around it looks like it's the load_dat_file that gets stuck.
It's weird because the first file gets loaded fine. If you have the time, can you see if you're able to load as well? It could just be some odd network issue or something
Ok seems like I am just impatient, it just takes a while to load (still unusual, doesnt usually take this long). Generally resolved though.
I think with this being the speed it is, it makes more sense to leave the already created xarrays on the bucket and just move forward loading those in and doing additional processing for my analysis?
OK, I was just writing back to say that it is working for me, but the second file is larger, so maybe it's taking longer.
I think with this being the speed it is, it makes more sense to leave the already created xarrays on the bucket and just move forward loading those in and doing additional processing for my analysis?
It does take 12 hours or something to run the whole thing, but does the speed make a big difference to you though? You could test the new capabilities on a subset then rerun the whole lot when you are happy, right?
Yeah that was my plan initially. I'll keep doing what I have so far then and adapt it if needed. I think either way will end up with more or less the same time consumption