MPAS-Dev/compass

Issue with temporary directories in ocean/mesh/cull.py

Opened this issue · 1 comments

I am currently running compass on a pretty standard Linux cluster (Debian 10). I started with a version of an ocean mesh generation test case: https://mpas-dev.github.io/compass/latest/tutorials/dev_add_rrm.html#running-the-mesh-test-case.

Everything worked without fatal errors until the step: cull_mesh:

      Failed
Exception raised while running the steps of the test case
Traceback (most recent call last):
  File "/flat6/Environmental_Data_Hub/UKCM_Code_Test_Repository/compass/compass/run/serial.py", line 320, in _log_and_run_test
    _run_test(test_case, available_resources)
  File "/flat6/Environmental_Data_Hub/UKCM_Code_Test_Repository/compass/compass/run/serial.py", line 417, in _run_test
    _run_step(test_case, step, test_case.new_step_log_file,
  File "/flat6/Environmental_Data_Hub/UKCM_Code_Test_Repository/compass/compass/run/serial.py", line 468, in _run_step
    step.run()
  File "/flat6/Environmental_Data_Hub/UKCM_Code_Test_Repository/compass/compass/ocean/mesh/cull.py", line 165, in run
    cull_mesh(with_critical_passages=True, logger=logger,
  File "/flat6/Environmental_Data_Hub/UKCM_Code_Test_Repository/compass/compass/ocean/mesh/cull.py", line 241, in cull_mesh
    _cull_mesh_with_logging(
  File "/flat6/Environmental_Data_Hub/UKCM_Code_Test_Repository/compass/compass/ocean/mesh/cull.py", line 361, in _cull_mesh_with_logging
    dsCulledMesh = cull(dsBaseMesh, dsMask=dsLandMask,
  File "/home/nheavens/mambaforge/envs/dev_compass_1.2.0-alpha.6_mpich/lib/python3.10/site-packages/mpas_tools/mesh/conversion.py", line 112, in cull
    with TemporaryDirectory(dir=dir) as tempdir:
  File "/home/nheavens/mambaforge/envs/dev_compass_1.2.0-alpha.6_mpich/lib/python3.10/tempfile.py", line 869, in __exit__
    self.cleanup()
  File "/home/nheavens/mambaforge/envs/dev_compass_1.2.0-alpha.6_mpich/lib/python3.10/tempfile.py", line 873, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "/home/nheavens/mambaforge/envs/dev_compass_1.2.0-alpha.6_mpich/lib/python3.10/tempfile.py", line 855, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "/home/nheavens/mambaforge/envs/dev_compass_1.2.0-alpha.6_mpich/lib/python3.10/shutil.py", line 731, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/home/nheavens/mambaforge/envs/dev_compass_1.2.0-alpha.6_mpich/lib/python3.10/shutil.py", line 729, in rmtree
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/flat6/Environmental_Data_Hub/UKCM_Code_Test_Repository/ocean_meshes/QU_27_5/ocean/global_ocean/QU27_5/mesh/cull_mesh/tmplkit3_ph'

Going through the traceback, I was able to trace this to line 361 of ocean/mesh/cull.py. I was able to run the equivalent MPAS-Tools command from the command line. The issue seems to be with the optional temporary directory that the cull function defined in mpas_tools (mpas_tools.mesh.conversion.cull). The call in ocean/mesh/cull.py uses this temporary directory and defines it as '.'. Omitting the definition of this directory avoids the creation of the temporary directory.

In other words, the fix is changing calls like:


 dsCulledMesh = cull(dsBaseMesh, dsMask=dsLandMask,
                        dsPreserve=dsPreserve, logger=logger, dir='.')

to:

 dsCulledMesh = cull(dsBaseMesh, dsMask=dsLandMask,
                        dsPreserve=dsPreserve, logger=logger)

Googling around suggests that the non-empty directory error "OSError: [Errno 39] Directory not empty:" may be a race condition, but no one is very sure. I can verify that the program is generating a temporary directory that contains files and then contains no files (hidden or otherwise) after the crash, so the race condition explanation makes sense to me.

xylar commented

@nickheavens-cgg, we have had inverse problems on a lot of our machines, where we are unable to create temporary directories in the default location (e.g. because of file system issues or because we are out of space). We have found that using the current directory solves our issues, so that's why we have needed dir='.' for our workflows.

I recently stopped removing the temporary directory in the cull() function so with the latest compass and MPAS-Tools, you should not have the OSError: [Errno 39] Directory not empty: issue because the attempt to remove the directory should not occur. See This MPAS-Tools commit:
MPAS-Dev/MPAS-Tools@8bdcbbb

Let me know if checking out the latest compass and recreating your conda environment using the latest main fixes the issue for you.