CUDA nvlink warning and memory error with 'virtual' specifier
Closed this issue · 1 comments
On PR #183, a base class Radiation
has a method computeEnergySink
:
MFEM_HOST_DEVICE double computeEnergySink(const double &T_h) {
printf("computeEnergySink not implemented");
assert(false);
return 0;
}
This is clearly overrode in the derived class NetEmission
, thus it should be appropriate to specify this method as virtual
.
Doing so, however, causes a compile-time warning as below:
libtool: compile: /usr/local/cuda/bin/nvcc -dlink -Xcompiler=-fPIC --expt-extended-lambda -arch=sm_75 -ccbin mpicxx -L/usr/local/cuda/lib64 -lcuda -lcudart .libs/averaging_and_rms.o .libs/faceGradientIntegration.o .libs/M2ulPhyS.o .libs/rhs_operator.o .libs/wallBC.o .libs/BCintegrator.o .libs/face_integrator.o .libs/riemann_solver.o .libs/BoundaryCondition.o .libs/fluxes.o .libs/masa_handler.o .libs/run_configuration.o .libs/domain_integrator.o .libs/forcing_terms.o .libs/mpi_groups.o .libs/sbp_integrators.o .libs/equation_of_state.o .libs/transport_properties.o .libs/inletBC.o .libs/outletBC.o .libs/utils.o .libs/io.o .libs/dgNonlinearForm.o .libs/gradients.o .libs/gradNonLinearForm.o .libs/quasimagnetostatic.o .libs/tps.o .libs/chemistry.o .libs/reaction.o .libs/collision_integrals.o .libs/argon_transport.o .libs/source_term.o .libs/gpu_constructor.o .libs/independent_coupling.o .libs/cycle_avg_joule_coupling.o .libs/table.o .libs/radiation.o ../utils/mfem_extras/.libs/pfem_extras.o -o .libs/tmp_cuda_object.o
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_60__47_tmpxft_000074b3_00000000_7_rhs_operator_cpp1_ii_bcfc24c1__ZN4mfem10CuKernel1DIZNK11RHSoperator16updatePrimitivesERKNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_60__47_tmpxft_000074b3_00000000_7_rhs_operator_cpp1_ii_bcfc24c1__ZN4mfem10CuKernel1DIZNK11RHSoperator16updatePrimitivesERKNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_60__47_tmpxft_000074b3_00000000_7_rhs_operator_cpp1_ii_bcfc24c1__ZN4mfem10CuKernel1DIZNK11RHSoperator11GetFlux_gpuERKNS_6VectorERNS_11DenseTensorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_55__42_tmpxft_00007d33_00000000_7_inletBC_cpp1_ii_75d6e0ab__ZN4mfem10CuKernel2DIZN7InletBC15interpInlet_gpuERKNS_6VectorERKNS_5ArrayIiEES8_RS2_S9_RS6_SA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_56__43_tmpxft_00007dfc_00000000_7_outletBC_cpp1_ii_2164ab1c__ZN4mfem10CuKernel2DIZN8OutletBC16interpOutlet_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_SC_SC_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_63__50_tmpxft_00007f52_00000000_7_dgNonlinearForm_cpp1_ii_806096d2__ZN4mfem10CuKernel2DIZN15DGNonLinearForm27sharedFaceInterpolation_gpuERKNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_63__50_tmpxft_00007f52_00000000_7_dgNonlinearForm_cpp1_ii_806096d2__ZN4mfem10CuKernel2DIZN15DGNonLinearForm27sharedFaceInterpolation_gpuERKNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_63__50_tmpxft_00007f52_00000000_7_dgNonlinearForm_cpp1_ii_806096d2__ZN4mfem10CuKernel1DIZN15DGNonLinearForm16evalFaceFlux_gpuEvEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_63__50_tmpxft_00007f52_00000000_7_dgNonlinearForm_cpp1_ii_806096d2__ZN4mfem10CuKernel1DIZN15DGNonLinearForm16evalFaceFlux_gpuEvEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_59__46_tmpxft_000005f1_00000000_7_source_term_cpp1_ii_07df05d6__ZN4mfem10CuKernel1DIZN10SourceTerm11updateTermsERNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_59__46_tmpxft_000005f1_00000000_7_source_term_cpp1_ii_07df05d6__ZN4mfem10CuKernel1DIZN10SourceTerm11updateTermsERNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_59__46_tmpxft_000005f1_00000000_7_source_term_cpp1_ii_07df05d6__ZN4mfem10CuKernel1DIZN10SourceTerm11updateTermsERNS_6VectorEEUliE_EEviT_' cannot be statically determined
Furthermore, it causes a runtime error even when Radiation
class is never used, as below (from test/cyl3d.gpu.test
):
CUDA error: (cudaGetLastError()) failed with error:
--> an illegal memory access was encountered
... in function: void mfem::CuWrap1D(int, DBODY&&) [with int BLCK = 256; DBODY = __nv_dl_wrapper_t<__nv_dl_tag<void (RHSoperator::*)(const mfem::Vector&, mfem::DenseTensor&) const, &RHSoperator::GetFlux_gpu, 1>, const int, const double*, const int, const int, const double*, Fluxes*, const bool, const double*, const double*, double*>&]
... in file: /home/karl/sw/mfem-gpu-4.4/include/mfem/general/forall.hpp:396
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
CUDA error: (cudaFree(dptr)) failed with error:
--> driver shutting down
... in function: void* mfem::CuMemFree(void*)
... in file: general/cuda.cpp:86
Similar issues have been observed when implementing gpu routines for other classes (ex. #148), though at these moments it was possible to specify base class methods as virtual
. Currently this seems to be an issue of cuda, rather than of tps itself.
Until this issue is resolved, a new class that will be operating in the device should be implemented with caution. Currently, the only way to find out 'the working code' is trial-and-error of adding/removing virtual
specifier for the newly added class methods.
Since LinearTable
is the only option at the moment, this issue can be avoided by having the necTable_
object in the NetEmission
class have type LinearTable
rather than TableInterpolator *
. This change (implemented in 30dbe9c) eliminates the bad behavior described above.
The benefit of this approach is that computeEnergySink
can be made virtual
, which is necessary to get the desired behavior, even on the cpu. This allows us to move forward. But, since it only supports LinearTable
, we will have to revisit if we ever want to generalize.