venturi123/DRLinFluids

pass tests

Closed this issue · 14 comments

pass tests

Hello. Your job is of great help to my research. Thank you for opening up such a good job. I encountered a small problem when using docker to reproduce your work. When the terminal runs the launch_multiprocessing_traning_square.py file, the error is reported as follows :

$ vi launch_multiprocessing_traning_square.py
$ cd ..
$ python DRLinFluids_square/launch_multiprocessing_traning_square.py
/Drlinfluids/Square2D_Multiprocessing
['env01', 'env02', 'env03', 'env04', 'env05', 'env06']
/bin/bash: decomposePar: command not found
Traceback (most recent call last):
File "DRLinFluids_square/launch_multiprocessing_traning_square.py", line 369, in
test_sac_with_il()
File "DRLinFluids_square/launch_multiprocessing_traning_square.py", line 160, in test_sac_with_il
env = envobject_square.FlowAroundSquare2D(
File "/DRLinfluids/square2D_multiprocessing/DRLinFluids_square/DRLinFluids/environments_tianshou.py", line 112, in init
cfd.run_init(foam_root_path, foam_params)
File "/DRLinfluids/square2D_multiprocessing/DRLinFluids_square/DRLinFluids/utils.py", line 414, in wrapper
func(*args, **kwargs)
File "/DRLinfluids/square2D_multiprocessing/DRLinFluids_square/DRLinFluids/cfd.py", line 134, in run_init
subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cd /DRLinfluids/square2D_multiprocessing/env01 && decomposePar -force > /dev/null' returned non-zero exit status 127.

In order to find the problem, I added print print(root_path.title()),print(env_name_list). Printed the correct results. At the same time according to the code: cd /DRLinfluids/square2D_multiprocessing/env01 && decomposePar -force, should be forced to decomposePar. I don 't know why will report error. What puzzles me is that I downloaded the complete DRLinfluids folder from the doker's File, and ran it with VS under WSL but was able to run normally. I repeatedly checked the running start folder, did not find the problem. Could you please look at this problem for me? Thank you for your time and look forward to your reply.

Hi @jwcy0529 ,

This error may be due to an incorrect configuration of the OpenFOAM environment in your code. If you are not using the Docker/Singularity environment we provided, please check if this line of code is appropriately configured. In addition, we highly recommend using the Docker/Singularity environment for training. You can modify and test based on the two cases we provided, which can avoid various errors caused by configuring the environment.

P.S. In my personal experience, WSL is not a good development environment for using DRLinFLuids and may cause some strange errors. If you must set up a simulation environment on your own, I recommend using a real physical machine or a virtual machine similar to VirtualBox.

Best,
Qiulei

Hi @venturi123 I met the same problem in Ubuntu20.04.
`(DRLinFluids) Singularity> python DRLinFluids_cylinder/launch_multiprocessing_traning_cylinder.py

[mpiexec@shipmechanic-workstation4] HYDU_create_process (utils/launch/launch.c:74): execvp error on file /usr/bin/hydra_pmi_proxy (No such file or directory)
[mpiexec@shipmechanic-workstation4] HYD_pmcd_pmiserv_proxy_init_cb (pm/pmiserv/pmiserv_cb.c:448): assert (!closed) failed
[mpiexec@shipmechanic-workstation4] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@shipmechanic-workstation4] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
[mpiexec@shipmechanic-workstation4] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
/home/wulong/anaconda3/bin/mpicc: line 285: x86_64-conda_cos6-linux-gnu-cc: command not found
Traceback (most recent call last):
File "DRLinFluids_cylinder/launch_multiprocessing_traning_cylinder.py", line 89, in
env = envobject_cylinder.FlowAroundCylinder2D(
File "/home/wulong/jzz/DRLinFluids/examples/cylinder2D_multiprocessing/DRLinFluids_cylinder/DRLinFluids/environment_tensorforce.py", line 114, in init
cfd.run_init(foam_root_path, foam_params)
File "/home/wulong/jzz/DRLinFluids/examples/cylinder2D_multiprocessing/DRLinFluids_cylinder/DRLinFluids/utils.py", line 414, in wrapper
func(*args, **kwargs)
File "/home/wulong/jzz/DRLinFluids/examples/cylinder2D_multiprocessing/DRLinFluids_cylinder/DRLinFluids/cfd.py", line 144, in run_init
subprocess.run(
File "/opt/miniconda3/envs/DRLinFluids/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cd /home/wulong/jzz/DRLinFluids/examples/cylinder2D_multiprocessing/env01 && . /opt/openfoam8/etc/bashrc && reconstructPar > /dev/null' returned non-zero exit status 1.
`
Do you have any advices?

Hi @jiangzhangze ,

It seems that you are using your local conda environment, not the built-in one. Could you please show me the command for launching the script?

Besides, are you running on an HPC Cluster?

I run it on my local machine.
Here are the full command:

singularity shell DRLinfluids.sif
drl
cd DRLinfluids/examples/cylinder2D_mulyiprocessing

python DRLinFluids_cylinder/launch_multiprocessing_traning_cylinder.py

Hi @jiangzhangze ,

It seems that you are using your local conda environment, not the built-in one. Could you please show me the command for launching the script?

Besides, are you running on an HPC Cluster?

@jiangzhangze can you try to follow exactly the instructions from the comment:

#10 (comment)

?

Hi @jerabaul29 I can train it follow this comment.Do I need to follow this comment every time I enter the image?

yes, I think so (except of course the steps for downloading and un-taring the image :) ).

What should I do if I want to start a experiment without network such as HPC clutser

Hi @jiangzhangze ,

In theory, singularity supports running on an HPC cluster. I suggest you follow this part to do the experiment (you can download the container first and upload it to the HPC).

and Hi @jerabaul29 ,

But, we also have received a few reports from the community about running on the HPC cluster (#12 and #13).

I guess the main issue is about the MPI, but I don't have too much clue. I'm still working on it and wonder if you have any suggestions about this situation.

True this may be a cluster issue purely :) . But given the information provided I was a bit unsure if it can be a missing step in setup rather too, that's why I think going through it once following exactly the same steps could be good :) .

Hi @venturi123 and @jerabaul29 .I mean that how to skip the git clone https://github.com/venturi123/DRLinFluids.git .I have tried to bind DRLinFluids directory when shell the sif, but it seems doesn't work.

I may not understand what happened. But I think you can download the DRLinFluids container and the whole repository first. Then upload both of them to the cluster.

PS. The bind flag aims to bind your external storage device only, please check the note section.

Thank you @jerabaul29 , I will update the README to give more detailed step-by-step instructions and see if that is the problem.