adjtomo/seisflows

Code stop with errors

flywithheart opened this issue · 1 comments

When run the 'inversion' workflow with ‘slurm_sm’ system, the following problems always happened, and it is independent of the clusters and models that I used. And inversion with the same models and parameters (except the parameters for slurm_sm) work well if I use the system option 'multithreaded'. So I think model parameters for the inversion are correct.

Anyone has come across the problems and has any idea to solve them? Thank you very much!

WARNING: f0 != PAR.F0
WARNING: f0 != PAR.F0
Traceback (most recent call last):
File "/home1/03419/liren/seisflows-master/seisflows/system/wrappers/run", line 41, in
func(**kwargs)
File "/home1/03419/liren/seisflows-master/seisflows/solver/base.py", line 199, in eval_func
self.export_residuals(path)
File "/home1/03419/liren/seisflows-master/seisflows/solver/base.py", line 420, in export_residuals
unix.mkdir(join(path, 'residuals'))
File "/home1/03419/liren/seisflows-master/seisflows/tools/unix.py", line 81, in mkdir
os.makedirs(dir)
File "/home1/03419/liren/anaconda2/envs/obspy1.0.3/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 17] File exists: '/work/03419/liren/stampede2/tests/marmousi_offshore/scratch/evalfunc/residuals'
srun: error: c495-044: task 11: Exited with exit code 1
Generating data

Starting iteration 1
Generating synthetics
Computing gradient
Computing search direction
Computing step length
trial step 1
trial step 2
trial step 3
trial step 4

Starting iteration 2
Generating synthetics
Computing gradient
Computing search direction
Computing step length
trial step 1

Starting iteration 3
Generating synthetics
Computing gradient
Computing search direction
...
..
..
Starting iteration 21
Generating synthetics
Computing gradient
Computing search direction
Computing step length
trial step 1

Starting iteration 22
Generating synthetics
Computing gradient
Computing search direction
Computing step length
trial step 1
Traceback (most recent call last):
File "/tmp/slurmd/job1005876/slurm_script", line 24, in
workflow.main()
File "/home1/03419/liren/seisflows-master/seisflows/workflow/inversion.py", line 128, in main
self.line_search()
File "/home1/03419/liren/seisflows-master/seisflows/workflow/inversion.py", line 185, in line_search
self.evaluate_function()
File "/home1/03419/liren/seisflows-master/seisflows/workflow/inversion.py", line 212, in evaluate_function
path=PATH.FUNC)
File "/home1/03419/liren/seisflows-master/seisflows/system/slurm_sm.py", line 125, in run
+ '%s ' % PAR.ENVIRONS)
File "/home1/03419/liren/seisflows-master/seisflows/tools/tools.py", line 31, in call
subprocess.check_call(*args, **kwargs)
File "/home1/03419/liren/anaconda2/envs/obspy1.0.3/lib/python2.7/subprocess.py", line 186, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'srun --wait=0 /home1/03419/liren/seisflows-master/seisflows/system/wrappers/run /work/03419/liren/stampede2/tests/marmousi_offshore/output solver eval_func ' returned non-zero exit status 1
1

When this error happed, the residual file for source number 11 (/scratch/evalfunc/residuals/000011) is not built,
OSError: [Errno 17] File exists: '/work/03419/liren/stampede2/tests/marmousi_offshore/scratch/evalfunc/residuals'
srun: error: c495-044: task 11: Exited with exit code 1

When the second error happed, the code

  • exits

subprocess.CalledProcessError: Command 'srun --wait=0 /home1/03419/liren/seisflows-master/seisflows/system/wrappers/run /work/03419/liren/stampede2/tests/marmousi_offshore/output solver eval_func ' returned non-zero exit status 1

I know this is a late answer but it may help someone else in the future. This kind of error is typically due to a mix-up between parallel processes when writing files. This is why some "sleep" commands have been added (see for example in seisflows/tools/unix.py function mkdir).
When facing this kind of messages (which magically disappear when using a debugger) try to add a few seconds more sleep time in mkdir or elsewhere. Type grep -rni "sleep" * in seisflows main folder to see some examples of what has been done.