MESAHub/mesa

finite `op_split_burn_min_T_for_variable_T_solver` causes segfault

mathren opened this issue · 4 comments

Describe the bug
When using op_split_burn = .true., together with a finite value op_split_burn_min_T_for_variable_T_solver (instead of the default 1d99), one obtains an immediate segfault.

To Reproduce
Steps to reproduce the behavior:
I uploaded here a minimal work example for r23.05.1 that runs a 3Msun star and turns on op_split_burn at logT=7.3. At step 67 the error occurs (the photos folder contains a photo calculated with r23.05.1 at step 65 for convenience).

Expected behavior
I realize this is probably due to something not fully implemented -- I would recommend mentioning this is an experimental setting not ready for use in the documentation/controls.default

Screenshots
I copy here the full backtrace. Line 122 in $MESA_DIR/net/private/net_initialize.f90 is an empty line (in r23.05.1).

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
In file '../private/net_initialize.f90', around line 122: Error allocating 7432401508362160128 bytes: Cannot allocate memory

Error termination. Backtrace:
#0  0x7f9966f7308f in ???
	at /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#1  0x827202 in __net_initialize_MOD_setup_net_info
	at ../private/net_initialize.f90:95
#0  0x8288e9 in __net_initialize_MOD_setup_net_info
	at ../private/net_initialize.f90:121
#2  0x833503 in __net_burn_const_density_MOD_burn_const_density_1_zone
	at ../private/net_burn_const_density.f90:242
#1  0x833503 in __net_burn_const_density_MOD_burn_const_density_1_zone
	at ../private/net_burn_const_density.f90:242
#3  0x818d15 in __net_lib_MOD_net_1_zone_burn_const_density
	at ../public/net_lib.f90:1090
#2  0x818d15 in __net_lib_MOD_net_1_zone_burn_const_density
	at ../public/net_lib.f90:1090
#3  0x75f660 in burn1_zone
	at ../private/struct_burn_mix.f90:1139
#4  0x75f660 in __struct_burn_mix_MOD_do_burn._omp_fn.0
#4  0x75f660 in burn1_zone
	at ../private/struct_burn_mix.f90:965
	at ../private/struct_burn_mix.f90:1139
#5  0x75f660 in __struct_burn_mix_MOD_do_burn._omp_fn.0
	at ../private/struct_burn_mix.f90:965
#6  0x7f99675bd281 in GOMP_parallel
	at /home/user/sdk2-tmp/build/gcc/libgomp/parallel.c:178
#7  0x76267b in do_burn
	at ../private/struct_burn_mix.f90:949
#8  0x76267b in __struct_burn_mix_MOD_do_struct_burn_mix
	at ../private/struct_burn_mix.f90:113
#5  0x7f99675c5eb5 in gomp_thread_start
	at /home/user/sdk2-tmp/build/gcc/libgomp/team.c:129
#6  0x7f996712a608 in start_thread
	at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477
#7  0x7f996704f132 in ???
	at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#8  0xffffffffffffffff in ???
#9  0x5b4678 in do_step_part2
	at ../private/evolve.f90:682
#10  0x5b8ac9 in __evolve_MOD_do_evolve_step_part2
	at ../private/evolve.f90:483

Desktop (please complete the following information):
I doubt any of this matter, since I got this on two different machines. The example folder was run on

  • OS: ubuntu 20.04
  • MESA r23.05.1

3Msun star and turns on op_split_burn at logT=7.3.

operating splitting was designed to operate at much higher temperatures (~1e9 k). see section 10.2 of jermyn et al 2023 (mesa VI). it would surprise me if it worked robustly outside of its design range. this said, i agree a segfault could be replaced by a more useful message.

from mwe's out.txt:
net name basic.net

one of operating splitting design points was assuming the use of large networks. i'd be super surprised if it worked for any of the small, hardwired networks!

Yes, I am aware, the temperature and network choice there was just to speed up the example, in my science applications im turning this on at logT=9, which is more reasonable (though lower than the default). The traceback does not change because of that.
However, I am doing it with approx21 (plus cr56), where I still seem to gain some stability for the models, provided I don't ask it to solve for T as well. Is this a direction I should not try? It is not clear to me why this would not work with a small net...

Thanks for catching this segfault. I think this main...fix_const_density_burn_solver fixes the segfault, however the variable T solver produces bad numbers while operating. Unless we can fix this issue promptly, It's probably best to mark this control as "developmental" for the moment.

Just as a note, the error produced when running with "op_split_burn_min_T_for_variable_T_solver" is:
"net_burn_const_density failed in jakob_or_derivs: bad y 1 NaN"