pytroll/satpy

Question: How to release loaded data from memory?

Closed this issue · 4 comments

I'm working on a batch mode in python script. The process is: dataset move into data_folder → process to GeoTiff → move out → next loop. But I got WinError 32 when I tring to remove or delete the done. So is there anything like file.close()?

Is there any way you can provide us some example code? It is really hard to say without knowing how you are using Satpy. I also am not sure what WinError 32 means (I haven't used Windows for 10+ years). Is there a python traceback when you get the error?

In general Satpy will attempt to close or release things when the Scene object(s) are garbage collected. There are some special cases though that depend on the reader you are using. Some readers use xarray to open files which does some caching of open file objects so you don't have a ton of control over when files are released/closed. For creating geotiffs, satpy will generally close the files immediately after writing the file unless you are doing something very hacky/fancy.

Here are some of my lines. To make it clear for reading I'll just use single dataset.

# use system command to walkthrough files in a backup_folder
dataset_response = os.popen("dir /b /a-d gk2a_ami_le1b_vi006_fd005ge_*.nc")

# loop
for gk2afile in dataset_response:
    vi006 = gk2afile
    
    # move the dataset to process_folder
    shutil.move(vi006, process_folder)
    
    # SatPy part
    files = find_files_and_readers(base_dir=process_folder,
                                   reader='ami_l1b')
    R_scn = Scene(filenames=files)
    R = 'Rdaynight'
    R_scn.load([R])
    R_scn.save_dataset(R, filename='gk2a_{name}_{start_time:%Y%m%d_%H%M}.tif', writer='geotiff')

    # move out the finished dataset to a done_folder
    vi006_done = process_folder + vi006
    shutil.move(vi006_done, done_folder)

Everything is fine till the last move step. I got errors PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: THAT DATASET FILE. It looks like the dataset is still loaded and not closed.

I did exactly the same lines (except for the os command) under Linux and there's nothing wrong. So I'm guessing this is something about the different memory mechanisms of Windows and Linux.

Well this issue is no big deal for me. If it could be solved that would be better. If not, then I'll just go for Linux.

To be honest most of the satpy core developers are linux users and do operational processing on linux servers. So we wouldn't have run into this ourselves. That said, it shouldn't not work.

Something to try to force it, try adding this right before the last shutil.move:

del R_scn
import gc
gc.collect()

This should force garbage collection. Not great to use, but in this case with the caching that might be going on in the background with rasterio and the fact that Windows likes to be very annoying about open files...maybe it is needed as a workaround.

This del works. Thank you!