Melissa writer unusable
christoph-conrads opened this issue · 11 comments
When Code_Saturne 6.0.5 or Code_Saturne 6.2.0 are run with writer format "melissa", then the solver terminates with an error message before calling any Melissa code. Please let me know if you need more information.
Attachments:
- Makefile for Code_Saturne and dependencies (CGNS, med, Melissa, tested on Ubuntu 20, must be compressed because GitHub refuses Makefile attachments)
- Code_Saturne input (Melissa fluid example) with
case1.xml
and pre-processed mesh
$ env LD_LIBRARY_PATH=/tmp/prefix-6.2.0/lib /tmp/prefix-6.2.0/bin/code_saturne run --param=case1.xml
Warning:
'run.cfg' not found in case directory; case update recommended.
code_saturne
************
Version: 6.2.0
Path: /tmp/prefix-6.2.0
Result directory:
/home/ubuntu/workspace/RESU/20201219-1758_1
****************************************
Compiling user subroutines and linking
****************************************
****************************
Preparing calculation data
****************************
Single processor code_saturne simulation.
***************************
Preprocessing calculation
***************************
**********************
Starting calculation
**********************
solver script exited with status 1.
Error running the calculation.
Check code_saturne log (listing) and error* files for details.
*****************************
Post-calculation operations
*****************************
Error in calculation stage.
$ cat RESU/20201219-1718/error
System error: File exists
../../../code_saturne-6.2.0/src/fvm/fvm_to_melissa.c:326: Fatal error.
Error creating Melissa writer: "results":
only one Melissa server may be used, and is already used by
writer: "joining".
Call stack:
1: 0x7fd1450f0de0 <fvm_to_melissa_init_writer+0x580> (fvm_melissa.so)
2: 0x7fd14696c588 <+0x98588> (libsaturne-6.2.so)
3: 0x7fd14696d746 <fvm_writer_init+0x486> (libsaturne-6.2.so)
4: 0x7fd146a419e5 <+0x16d9e5> (libsaturne-6.2.so)
5: 0x7fd146a42ab8 <+0x16eab8> (libsaturne-6.2.so)
6: 0x7fd146a4732c <cs_post_write_meshes+0xac> (libsaturne-6.2.so)
7: 0x7fd146a4aaa7 <cs_post_init_meshes+0x6d7> (libsaturne-6.2.so)
8: 0x7fd147821678 <main+0x2f8> (libcs_solver-6.2.so)
9: 0x7fd1467000b3 <__libc_start_main+0xf3> (libc.so.6)
10: 0x55770341408e <_start+0x2e> (cs_solver)
End of stack
Hello,
This is actually related to the fact that "auxiliary" writers use the same format as the "main" writer, and as explained in the error message, only one Melissa server is actually possible. Using multiple Melissa servers would require that Melissa propose a feature/API option to connect to separate separate servers from the same code, which does not seem to be the case in the present API. So this is a Melissa issue, not a code_saturne one.
In your case, I assume that you have a joining or periodicity mesh pre-processing step activated, given the error message. Using the default visualization level, a writer is used to check the correct joining. The joining info preprocessed here carries no physical field information so is not releveant for Melissa.
There are 2 possible setting solutions here:
- lower the visualization level for mesh joining (set it to zero)
- an even better solution is to run the preprocessing in a separate step, and use the mesh_output (with no additional preprocessing for the CFD computation).
- do not set the type of the main (results) writer to Melissa, but add a separate writer, and associate post-processing meshes to that writer instead of "results" (though using both is possible).
Hi Yvan, thank you for these proposals.
- lower the visualization level for mesh joining (set it to zero)
Relevant DATA/case1.xml
snippet
<solution_domain>
<joining>
<face_joining name="1">
<visualization>0</visualization>
This does not work for me, I get the same error message as before.
- do not set the type of the main (results) writer to Melissa, but add a separate writer, and associate post-processing meshes to that writer instead of "results" (though using both is possible).
Here are screenshots of the changes that I made (see screenshots below):
- Have a postprocessing results writer that does not have Format "melissa".
- Add a postprocessing writer with Format "melissa".
- Go to postprocessing -> Mesh and for every mesh type, select the writer with format melissa as associated writer.
This causes an error message that is only visible in the GUI log window and cannot be found in any log file:
**********************
Starting calculation
**********************
*** The MPI_Comm_dup() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[my-hostname:99434] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*****************************
Post-calculation operations
I am checking if the Melissa code is causing the error message.
I am checking if the Melissa code is causing the error message.
Code_Saturne 6.0.5 contains two files with calls to MPI_Comm_dup
and none of them are related to Melissa code:
code_saturne-6.0.5$ fgrep -lr MPI_Comm_dup
src/base/cs_file.c
src/fvm/fvm_to_catalyst.cxx
This causes an error message that is only visible in the GUI log window and cannot be found in any log file:
********************** Starting calculation ********************** *** The MPI_Comm_dup() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort. [my-hostname:99434] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! ***************************** Post-calculation operations
Increasing the number of processes to 2 fixes this problem and makes Code_Saturne run the Melissa initialization code. To change the number of processes in the GUI, go to "Run computation" and search for "Number of processes" in the new window.
Hello,
Do you have a "run_solver.log" file in the exectution (RESU/<run_id>) directory ? Could you post it ?
Otherwise, do you have the possiblity of running under a debugger (set "gdb" for advanced run options in the GUI), and put a breakpoint on MPI_Comm_Dup ? In case it is also called in a loaded library or some other subtle initialization effect occurs.
Hi Yvan,
Do you have a "run_solver.log" file in the exectution (RESU/<run_id>) directory ? Could you post it ?
yes, here it is: run_solver.log
You can also have a copy of the Code_Saturne working directory including the directories DATA
, MESH
, and RESU
.
Otherwise, do you have the possiblity of running under a debugger (set "gdb" for advanced run options in the GUI), and put a breakpoint on MPI_Comm_Dup ? In case it is also called in a loaded library or some other subtle initialization effect occurs.
MPI_Comm_dup
is called in the Melissa source code outside of Code_Saturne. I need to take a look at the call site.
(gdb) inferior 1
[Switching to inferior 1 [process 53313] (/usr/bin/python3)]
[Switching to thread 1.1 (Thread 0x7ffff7c00740 (LWP 53313))]
#0 0x00007ffff7eb1dba in __GI___wait4 (pid=53327, stat_loc=0x7fffffffc40c, options=0, usage=0x0)
at ../sysdeps/unix/sysv/linux/wait4.c:27
27 in ../sysdeps/unix/sysv/linux/wait4.c
(gdb) continue
Continuing.
**********************
Starting calculation
**********************
[New inferior 12 (process 53328)]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so...
Reading symbols from /usr/lib/debug/.build-id/4f/c5fc33f4429136a494c640b113d76f610e4abc.debug...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.31.so...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libutil-2.31.so...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.31.so...
process 53328 is executing new program: /usr/bin/bash
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.31.so...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so...
[New inferior 13 (process 53329)]
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.31.so...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so...
process 53329 is executing new program: /tmp/workspace/RESU/20210122-1718/cs_solver
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.31.so...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.31.so...
Reading symbols from /usr/lib/debug/.build-id/4f/c5fc33f4429136a494c640b113d76f610e4abc.debug...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libutil-2.31.so...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libnss_files-2.31.so...
Reading symbols from /tmp/prefix/lib/libmelissa.so.0.7.0...
Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libresolv-2.31.so...
[New Thread 0x7ffff4d94700 (LWP 53330)]
[New Thread 0x7ffff4593700 (LWP 53331)]
[Switching to Thread 0x7ffff56a2240 (LWP 53329)]
Thread 13.1 "cs_solver" hit Breakpoint 1, 0x00007ffff63d2470 in PMPI_Comm_dup () from /lib/x86_64-linux-gnu/libmpi.so.40
(gdb) bt fu
#0 0x00007ffff63d2470 in PMPI_Comm_dup () from /lib/x86_64-linux-gnu/libmpi.so.40
No symbol table info available.
#1 0x00007ffff51e3464 in melissa_init_internal (field_name=0x7fffffffdcf0 "boundary zone id", local_vect_size=3585, comm_size=1,
rank=0, comm=0x7ffff647c3e0 <ompi_mpi_comm_world>) at /home/ubuntu/melissa/src/api/melissa_api.c:460
simu_id_a = 0x646920656e6f7a <error: Cannot access memory at address 0x646920656e6f7a>
server_node_name = 0x0
port_name = '\000' <repeats 288 times>
i = 32767
j = -1366382016
ret = 0
simu_id = -1
file = 0x0
linger = -1
master_node_names = 0x0
master_requester = 0x0
first_init = 1
field_data_ptr = 0x0
msg = {
_ = "ໝ\366\377\177\000\000\000\000\000\000\000\000\000\000SAGES/libc.mo\000\000\000\000\000\000\000\000\000\b@\000\000\000\000\000\000\000\000mory\000: \000%s%s%s\000%"}
buf_ptr = 0x0
__PRETTY_FUNCTION__ = "melissa_init_internal"
__func__ = "melissa_init_internal"
master_node_name = '\000' <repeats 26 times>, " \000 \000\227X\016\032\374\252?\000\000\000\000\000\000\000\000\000\227X\016\032\374\252?\000\000\000\000\000\000\000\000\232\231\231\231\231\231\251?", '\000' <repeats 14 times>, "\006@", '\000' <repeats 24 times>, "\377\377\377\377\000\000\000\000\005\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000`\246 \365\377\177\000\000\300\347\\UUU\000\000\207\001\376\367\377\177\000\000\001", '\000' <repeats 15 times>, "\377\377\377\377\000\000\000\000`_\240\366\377\177\000\000\360\336\377\377\377\177\000\000"...
#2 0x00007ffff51e5a91 in melissa_init_no_mpi (field=0x7fffffffdcf0 "boundary zone id", vector_size=3585)
at /home/ubuntu/melissa/src/api/melissa_api.c:1491
rank = 0
comm_size = 1
--Type <RET> for more, q to quit, c to continue without paging--
#3 0x00007ffff520a83a in _field_c_output () from /tmp/prefix/lib/code_saturne/fvm_melissa.so
No symbol table info available.
#4 0x00007ffff6a84cbc in fvm_writer_field_helper_output_e () from /tmp/prefix/lib/libsaturne-6.0.so
No symbol table info available.
#5 0x00007ffff520b604 in fvm_to_melissa_export_field () from /tmp/prefix/lib/code_saturne/fvm_melissa.so
No symbol table info available.
#6 0x00007ffff6a82882 in fvm_writer_export_field () from /tmp/prefix/lib/libsaturne-6.0.so
No symbol table info available.
#7 0x00007ffff6b4eea9 in _cs_post_write_mesh () from /tmp/prefix/lib/libsaturne-6.0.so
No symbol table info available.
#8 0x00007ffff6b52b1b in cs_post_write_meshes () from /tmp/prefix/lib/libsaturne-6.0.so
No symbol table info available.
#9 0x00007ffff6b567f6 in cs_post_init_meshes () from /tmp/prefix/lib/libsaturne-6.0.so
No symbol table info available.
#10 0x00007ffff7fc5882 in cs_run () from /tmp/prefix/lib/libcs_solver-6.0.so
No symbol table info available.
#11 0x00007ffff7fc5561 in main () from /tmp/prefix/lib/libcs_solver-6.0.so
No symbol table info available.
#12 0x00007ffff68170b3 in __libc_start_main (main=0x7ffff7fc53c0 <main>, argc=1, argv=0x7fffffffe438, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe428) at ../csu/libc-start.c:308
self = <optimized out>
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {93824992237136, -729303806703755678, 93824992235616, 140737488348208, 0, 0,
729303805743930978, 729322622568570466}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x1, 0x7fffffffe438}, data = {
prev = 0x0, cleanup = 0x0, canceltype = 1}}}
not_first_call = <optimized out>
#13 0x000055555555508e in _start ()
No symbol table info available.
Otherwise, do you have the possiblity of running under a debugger (set "gdb" for advanced run options in the GUI), and put a breakpoint on MPI_Comm_Dup ? In case it is also called in a loaded library or some other subtle initialization effect occurs.
Backtrace as above but with Code_Saturne debugging enabled (code_saturne-6.0.5/configure --enable-debug
).
Thread 13.1 "cs_solver" hit Breakpoint 1, melissa_init_internal (field_name=0x0, local_vect_size=0, comm_size=0, rank=32767,
comm=0x20) at /home/ubuntu/melissa/src/api/melissa_api.c:427
warning: Source file is more recent than executable.
427 {
(gdb) bt fu
#0 melissa_init_internal (field_name=0x0, local_vect_size=0, comm_size=0, rank=32767, comm=0x20)
at /home/ubuntu/melissa/src/api/melissa_api.c:427
server_node_name = 0x0
port_name = "0.`UUU\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\240\037\000\000\377\377\000\000\060.`UUU\000\000\003\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000.|\376\367\377\177\000\000\334\337\377\377", '\000' <repeats 12 times>, "\340_\215\365\377\177\000\000\240\037\000\000\377\377\000\000\070\232\374\367\377\177\000\000(\335\377\377\377\177\000\000\222j\347\365\377\177\000\000\200\334\377\377\377\177\000\000\220\334\377\377\377\177\000\000\351\261\375\367\377\177\000\000\002", '\000' <repeats 31 times>, "\220\347\\UUU\000\000\220\347\\UUU\000\000boundary"...
i = 32767
j = -1366382016
ret = 0
simu_id = 0
file = 0xff00000000000000
linger = 0
master_node_names = 0x0
master_requester = 0x797261646e756f62
first_init = 1
field_data_ptr = 0x646920656e6f7a20
msg = {
_ = "\340[\343\365\377\177\000\000\000\000\000\000\000\000\000\000SAGES/libc.mo", '\000' <repeats 19 times>, "mory\000: \000%s%s%s\000%"}
buf_ptr = 0x20797261646e756f <error: Cannot access memory at address 0x20797261646e756f>
__PRETTY_FUNCTION__ = "melissa_init_internal"
__func__ = "melissa_init_internal"
master_node_name = '\000' <repeats 26 times>, " \000 \206\265ZID\224\351?\000\000\000\000\000\000\000\000\351EH\233[I\362\277\000\000\000\000\000\000\000\000\240\006\262\365\377\177\000\000\240\020\216\365\377\177\000\000\240\006\262\365\377\177\000\000\240 \216\365\377\177\000\000\240\006\262\365\377\177\000\000\240\060\216\365\377\177\000\000\020\340\377\377\377\177\000\000\200PUUUU\000\000\060\344\377\377\377\177", '\000' <repeats 18 times>, "\207\001\376\367\377\177\000\000\001", '\000' <repeats 15 times>, "\020\340\377\377\377\177\000\000\b\000\346\365\377\177\000\000\060\337\377\377\377\177\000\000"...
#1 0x00007ffff463fa91 in melissa_init_no_mpi (field=0x7fffffffdd80 "boundary zone id", vector_size=3585)
at /home/ubuntu/melissa/src/api/melissa_api.c:1491
rank = 0
--Type <RET> for more, q to quit, c to continue without paging--
comm_size = 1
#2 0x00007ffff466493b in _field_c_output (context=0x7fffffffdff0, datatype=CS_DOUBLE, dimension=1, component_id=0, block_start=1,
block_end=3586, buffer=0x5555557890b0) at ../../../code_saturne-6.0.5/src/fvm/fvm_to_melissa.c:231
c = 0x7fffffffdff0
w = 0x5555557b00d0
n_values = 3585
__PRETTY_FUNCTION__ = "_field_c_output"
values = 0x5555557890b0
tmpn = "boundary zone id\000\000\000\000\341\r\000\000F\000\000\000\000\000\000\000\340\335\377\377\377\177\000\000\205\225*\366\377\177\000\000\210\376xUUU\000\000\034\204\342\366\377\177\000\000\002", '\000' <repeats 11 times>, "F\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\320\336\377\377\377\177\000\000\371\274\355\365\377\177\000\000\005\000\000\000\000\000\000\000\003\000\000\000\000\000\000"
tmpe = "\000\000\000\000\000"
c_name = 0x7fffffffdd80 "boundary zone id"
lce = 0
l = 17
use_melissa_mpi = false
#3 0x00007ffff5edbe5b in _field_helper_output_el (helper=0x5555557afb20, context=0x7fffffffdff0, export_section=0x5555555ce7a0,
src_dim=1, src_interlace=CS_INTERLACE, comp_order=0x0, n_parent_lists=1, parent_num_shift=0x7fffffffe1bc, datatype=CS_INT32,
field_values=0x7fffffffe1c0, output_func=0x7ffff4664679 <_field_c_output>)
at ../../../code_saturne-6.0.5/src/fvm/fvm_writer_helper.c:813
comp_id = 0
h = 0x5555557afb20
n_sections = 2
sub_size = 3585
n_elements = 3585
values = 0x5555557890b0 ""
current_section = 0x0
elt_size = 8
n_dim_loops = 1
convert_dim = 1
#4 0x00007ffff5eddfe6 in fvm_writer_field_helper_output_e (helper=0x5555557afb20, context=0x7fffffffdff0,
--Type <RET> for more, q to quit, c to continue without paging--
export_section=0x5555555ce7a0, src_dim=1, src_interlace=CS_INTERLACE, comp_order=0x0, n_parent_lists=1,
parent_num_shift=0x7fffffffe1bc, datatype=CS_INT32, field_values=0x7fffffffe1c0, output_func=0x7ffff4664679 <_field_c_output>)
at ../../../code_saturne-6.0.5/src/fvm/fvm_writer_helper.c:2297
__PRETTY_FUNCTION__ = "fvm_writer_field_helper_output_e"
#5 0x00007ffff4665651 in fvm_to_melissa_export_field (this_writer_p=0x5555557b00d0, mesh=0x555555602e30,
name=0x7ffff6eb6ee8 "boundary zone id", location=FVM_WRITER_PER_ELEMENT, dimension=1, interlace=CS_INTERLACE, n_parent_lists=1,
parent_num_shift=0x7fffffffe1bc, datatype=CS_INT32, time_step=-1, time_value=0, field_values=0x7fffffffe1c0)
at ../../../code_saturne-6.0.5/src/fvm/fvm_to_melissa.c:678
w = 0x5555557b00d0
c = {writer = 0x5555557b00d0, name = 0x7ffff6eb6ee8 "boundary zone id", time_step = -1, call_init = true}
helper = 0x5555557afb20
export_list = 0x5555555ce7a0
n_ranks = 1
f_id = 0
#6 0x00007ffff5eda576 in fvm_writer_export_field (this_writer=0x5555555cf310, mesh=0x555555602e30,
name=0x7ffff6eb6ee8 "boundary zone id", location=FVM_WRITER_PER_ELEMENT, dimension=1, interlace=CS_INTERLACE, n_parent_lists=1,
parent_num_shift=0x7fffffffe1bc, datatype=CS_INT32, time_step=-1, time_value=0, field_values=0x7fffffffe1c0)
at ../../../code_saturne-6.0.5/src/fvm/fvm_writer.c:1510
t0 = {wall_sec = 1611338646, wall_nsec = 317423397, cpu_sec = 0, cpu_nsec = 32952908}
t1 = {wall_sec = 140737488347440, wall_nsec = 140737323286822, cpu_sec = -281470681808014, cpu_nsec = 140737488347456}
export_field_func = 0x7ffff4665356 <fvm_to_melissa_export_field>
__PRETTY_FUNCTION__ = "fvm_writer_export_field"
format_writer = 0x5555557b00d0
#7 0x00007ffff6099808 in _cs_post_write_fixed_zone_info (writer=0x5555555cf310, post_mesh=0x555555595428, nt_cur_abs=-1, t_cur_abs=0)
at ../../../code_saturne-6.0.5/src/base/cs_post.c:2091
parent_num_shift = {0}
_nt_cur_abs = -1
_t_cur_abs = 0
__PRETTY_FUNCTION__ = "_cs_post_write_fixed_zone_info"
output = true
var_ptr = {0x5555557716c0}
name = 0x7ffff6eb6ee8 "boundary zone id"
--Type <RET> for more, q to quit, c to continue without paging--
#8 0x00007ffff6099b4b in _cs_post_write_mesh (post_mesh=0x555555595428, ts=0x0)
at ../../../code_saturne-6.0.5/src/base/cs_post.c:2237
j = 0
time_dep = FVM_WRITER_FIXED_MESH
write_mesh = true
writer = 0x555555599c70
nt_cur = -1
t_cur = 0
#9 0x00007ffff609e7b6 in cs_post_write_meshes (ts=0x0) at ../../../code_saturne-6.0.5/src/base/cs_post.c:5046
i = 1
post_mesh = 0x555555595428
t_top_id = 0
#10 0x00007ffff60a07ee in cs_post_init_meshes (check_mask=0) at ../../../code_saturne-6.0.5/src/base/cs_post.c:6224
n_probe_sets = 1
#11 0x00007ffff7fc57f5 in cs_run () at ../../../code_saturne-6.0.5/src/apps/cs_solver.c:338
ivoset = 0
check_mask = 0
halo_type = CS_HALO_STANDARD
#12 0x00007ffff7fc5d55 in main (argc=1, argv=0x7fffffffe438) at ../../../code_saturne-6.0.5/src/apps/cs_solver.c:678
s_param = "setup.xml"
t_id = <optimized out>
#13 0x00007ffff5c710b3 in __libc_start_main (main=0x7ffff7fc5ba3 <main>, argc=1, argv=0x7fffffffe438, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe428) at ../csu/libc-start.c:308
self = <optimized out>
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {93824992236592, 8396096192585510322, 93824992235648, 140737488348208, 0, 0,
-8396096191901618766, -8396109870866125390}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x1, 0x7fffffffe438},
data = {prev = 0x0, cleanup = 0x0, canceltype = 1}}}
not_first_call = <optimized out>
#14 0x00005555555550ae in _start ()
Hi @YvanFournier, would it be ok for Melissa to call MPI_Init
if this was not done before?
Hello,
Yes, as long as you check using MPI_Initialized whether it was already initialized or not.
UI believe ParaView Catalyst does something similar.
Best regards,
Yvan
Hi Yvan,
then I will modify the Melissa code accordingly.