deepmodeling/dpdispatcher

RuntimeError in make_model_devi step

wankiwi opened this issue · 5 comments

After I updated the dpdispatcher version to 0.4.18, I got the following error when DPGEN performed the make_model_devi, which can be solved when downgrading dpdispatcher to 0.4.17.

2022-09-20 04:00:40,341 - INFO : job: 31fcd1c1d95b2fedff35615bf29adbc61e3057e5 315398 finished
INFO:dpgen:-------------------------iter.000007 task 02--------------------------
INFO:dpgen:-------------------------iter.000007 task 03--------------------------
INFO:dpgen:-------------------------iter.000007 task 04--------------------------
Traceback (most recent call last):
  File "/home/kwwan/.local/bin/dpgen", line 8, in <module>
    sys.exit(main())
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/main.py", line 185, in main
    args.func(args)
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 3914, in gen_run
    run_iter (args.PARAM, args.MACHINE)
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 3787, in run_iter
    run_model_devi (ii, jdata, mdata)
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 1614, in run_model_devi
    run_md_model_devi(iter_index,jdata,mdata)
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 1608, in run_md_model_devi
    submission.run_submission()
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/submission.py", line 176, in run_submission
    self.generate_jobs()
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/submission.py", line 340, in generate_jobs
    self.bind_machine(self.machine)
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/submission.py", line 163, in bind_machine
    self.machine.context.bind_submission(self)
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/ssh_context.py", line 389, in bind_submission
    self.block_checkcall(f"mv {old_remote_root} {self.remote_root}")
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/ssh_context.py", line 537, in block_checkcall
    raise RuntimeError("Get error code %d in calling %s through ssh with job: %s . message: %s" %
RuntimeError: Get error code 1 in calling mv /data/home/scv3616/run/wankw/temp/dpmd_remote/447fbf8e9ee0ecc33a67e8f01f1847a2d3888f29 /data/home/scv3616/run/wankw/temp/dpmd_remote/5b3271c64c830aca6cfc836322191dc2482054ad through ssh with job: 5b3271c64c830aca6cfc836322191dc2482054ad . message:

Maybe #261 is related to it?

njzjz commented

Task 04 is the run_model_devi step. Your stderr message is empty. Can you try to execute it manually?

Task 04 is the run_model_devi step. Your stderr message is empty. Can you try to execute it manually?

I can't do it manually, for I have downgraded dpdispatcher to 0.4.17 and it executed run_model_devi successfully.

njzjz commented

I can't reproduce it. @HuangJiameng Do you have the same issue?

I can't reproduce it. @HuangJiameng Do you have the same issue?

I don’t know if it is helpful for you to reproduce this error, I used this branch in order to use the merge_traj function