pecos/tps

CUDA nvlink warning and memory error with 'virtual' specifier

Closed this issue · 1 comments

On PR #183, a base class Radiation has a method computeEnergySink:

  MFEM_HOST_DEVICE double computeEnergySink(const double &T_h) {
    printf("computeEnergySink not implemented");
    assert(false);
    return 0;
  }

This is clearly overrode in the derived class NetEmission, thus it should be appropriate to specify this method as virtual.
Doing so, however, causes a compile-time warning as below:

libtool: compile:  /usr/local/cuda/bin/nvcc -dlink -Xcompiler=-fPIC --expt-extended-lambda -arch=sm_75 -ccbin mpicxx -L/usr/local/cuda/lib64 -lcuda -lcudart .libs/averaging_and_rms.o .libs/faceGradientIntegration.o .libs/M2ulPhyS.o .libs/rhs_operator.o .libs/wallBC.o .libs/BCintegrator.o .libs/face_integrator.o .libs/riemann_solver.o .libs/BoundaryCondition.o .libs/fluxes.o .libs/masa_handler.o .libs/run_configuration.o .libs/domain_integrator.o .libs/forcing_terms.o .libs/mpi_groups.o .libs/sbp_integrators.o .libs/equation_of_state.o .libs/transport_properties.o .libs/inletBC.o .libs/outletBC.o .libs/utils.o .libs/io.o .libs/dgNonlinearForm.o .libs/gradients.o .libs/gradNonLinearForm.o .libs/quasimagnetostatic.o .libs/tps.o .libs/chemistry.o .libs/reaction.o .libs/collision_integrals.o .libs/argon_transport.o .libs/source_term.o .libs/gpu_constructor.o .libs/independent_coupling.o .libs/cycle_avg_joule_coupling.o .libs/table.o .libs/radiation.o ../utils/mfem_extras/.libs/pfem_extras.o  -o .libs/tmp_cuda_object.o
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_65__52_tmpxft_000074b4_00000000_7_averaging_and_rms_cpp1_ii_276b271a__ZN4mfem10CuKernel1DIZN9Averaging13addSample_gpuEPNS_15ParGridFunctionES3_RiP10GasMixturePKS2_RKiSA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_60__47_tmpxft_000074b3_00000000_7_rhs_operator_cpp1_ii_bcfc24c1__ZN4mfem10CuKernel1DIZNK11RHSoperator16updatePrimitivesERKNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_60__47_tmpxft_000074b3_00000000_7_rhs_operator_cpp1_ii_bcfc24c1__ZN4mfem10CuKernel1DIZNK11RHSoperator16updatePrimitivesERKNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_60__47_tmpxft_000074b3_00000000_7_rhs_operator_cpp1_ii_bcfc24c1__ZN4mfem10CuKernel1DIZNK11RHSoperator11GetFlux_gpuERKNS_6VectorERNS_11DenseTensorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_54__41_tmpxft_000074ba_00000000_7_wallBC_cpp1_ii_b51a6334__ZN4mfem10CuKernel2DIZN6WallBC15interpWalls_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_RKiEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_55__42_tmpxft_00007d33_00000000_7_inletBC_cpp1_ii_75d6e0ab__ZN4mfem10CuKernel2DIZN7InletBC15interpInlet_gpuERKNS_6VectorERKNS_5ArrayIiEES8_RS2_S9_RS6_SA_SA_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_56__43_tmpxft_00007dfc_00000000_7_outletBC_cpp1_ii_2164ab1c__ZN4mfem10CuKernel2DIZN8OutletBC16interpOutlet_gpuERKNS_6VectorERKNS_5ArrayIiEES8_PNS_15ParGridFunctionESA_RS2_SB_RS6_SC_SC_EUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_63__50_tmpxft_00007f52_00000000_7_dgNonlinearForm_cpp1_ii_806096d2__ZN4mfem10CuKernel2DIZN15DGNonLinearForm27sharedFaceInterpolation_gpuERKNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_63__50_tmpxft_00007f52_00000000_7_dgNonlinearForm_cpp1_ii_806096d2__ZN4mfem10CuKernel2DIZN15DGNonLinearForm27sharedFaceInterpolation_gpuERKNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_63__50_tmpxft_00007f52_00000000_7_dgNonlinearForm_cpp1_ii_806096d2__ZN4mfem10CuKernel1DIZN15DGNonLinearForm16evalFaceFlux_gpuEvEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_63__50_tmpxft_00007f52_00000000_7_dgNonlinearForm_cpp1_ii_806096d2__ZN4mfem10CuKernel1DIZN15DGNonLinearForm16evalFaceFlux_gpuEvEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_59__46_tmpxft_000005f1_00000000_7_source_term_cpp1_ii_07df05d6__ZN4mfem10CuKernel1DIZN10SourceTerm11updateTermsERNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_59__46_tmpxft_000005f1_00000000_7_source_term_cpp1_ii_07df05d6__ZN4mfem10CuKernel1DIZN10SourceTerm11updateTermsERNS_6VectorEEUliE_EEviT_' cannot be statically determined
nvlink warning : Stack size for entry function '__nv_static_59__46_tmpxft_000005f1_00000000_7_source_term_cpp1_ii_07df05d6__ZN4mfem10CuKernel1DIZN10SourceTerm11updateTermsERNS_6VectorEEUliE_EEviT_' cannot be statically determined

Furthermore, it causes a runtime error even when Radiation class is never used, as below (from test/cyl3d.gpu.test):

CUDA error: (cudaGetLastError()) failed with error:
 --> an illegal memory access was encountered
 ... in function: void mfem::CuWrap1D(int, DBODY&&) [with int BLCK = 256; DBODY = __nv_dl_wrapper_t<__nv_dl_tag<void (RHSoperator::*)(const mfem::Vector&,     mfem::DenseTensor&) const, &RHSoperator::GetFlux_gpu, 1>, const int, const double*, const int, const int, const double*, Fluxes*, const bool, const double*,   const double*, double*>&]
 ... in file: /home/karl/sw/mfem-gpu-4.4/include/mfem/general/forall.hpp:396
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0


CUDA error: (cudaFree(dptr)) failed with error:
 --> driver shutting down
 ... in function: void* mfem::CuMemFree(void*)
 ... in file: general/cuda.cpp:86

Similar issues have been observed when implementing gpu routines for other classes (ex. #148), though at these moments it was possible to specify base class methods as virtual. Currently this seems to be an issue of cuda, rather than of tps itself.

Until this issue is resolved, a new class that will be operating in the device should be implemented with caution. Currently, the only way to find out 'the working code' is trial-and-error of adding/removing virtual specifier for the newly added class methods.

Since LinearTable is the only option at the moment, this issue can be avoided by having the necTable_ object in the NetEmission class have type LinearTable rather than TableInterpolator *. This change (implemented in 30dbe9c) eliminates the bad behavior described above.

The benefit of this approach is that computeEnergySink can be made virtual, which is necessary to get the desired behavior, even on the cpu. This allows us to move forward. But, since it only supports LinearTable, we will have to revisit if we ever want to generalize.