virtual method of a class cannot be used in other class' method on gpu hip
Closed this issue · 2 comments
This is demonstrated in DGNonLinearForm::evalFaceFlux_gpu()
as a minimal example on the branch gpu_dryair_transport
.
// get state and gradient
for (int eq = 0; eq < num_equation; eq++) {
u1[eq] = d_uk_el1[eq + k * num_equation + iface * maxIntPoints * num_equation];
u2[eq] = d_uk_el2[eq + k * num_equation + iface * maxIntPoints * num_equation];
}
// evaluate flux
// d_rsolver->Eval_LF(u1, u2, nor, Rflux);
// For some unknown reason, the hip build is very sensitive to
// how Eval_LF is called. If we copy the state to u1, u2 and
// normal to nor and then call as above (which behaves correctly
// for CUDA), the resulting flux is nonsense. But, if we just
// point into d_uk_el1 at the right spots, all is well. This
// problem seems to have something to do with virtual functions
// and pointer arguments, but I don't understand it. However,
// this approach is working.
d_rsolver->Eval_LF(d_uk_el1 + k * num_equation + iface * maxIntPoints * num_equation,
d_uk_el2 + k * num_equation + iface * maxIntPoints * num_equation,
d_shapeWnor1 + offsetShape1 + maxDofs + 1 + k * (maxDofs + 1 + dim),
Rflux);
A known problem in this approach is that it is not applicable to other places where data array indexing order is different.
While this is working at least for d_rsolver->Eval_LF
, following the same approach for d_flux->ComputeViscousFluxes
does not work (see line 371 in src/dgNonlinearForm.cpp
). Through a small test on the branch issue145
, it turns out that the problem is the calling of other objects' virtual methods, such as mixture->computeSpeciesEnthalpies
or transport->ComputeFluxTransportProperties
within d_flux->ComputeViscousFluxes
.
-
When all these methods are hard-coded within the function,
d_flux->ComputeViscousFluxes
works. -
If these virtual methods are changed to non-virtual, they work properly on gpu hip. On the branch
issue145
(see line 95 ofsrc/equation_of_state.cpp
),GasMixture::computeSpeciesEnthalpies
is defined as non-virtual, which can passcyl3d.gpu.test
. (For demonstration purpose, I hard-codedtransport->ComputeFluxTransportProperties
withind_flux->ComputeViscousFluxes
, on line 283 ofsrc/fluxs.cpp
. So other tests will fail exceptcyl3d.gpu.test
.) -
Defining these methods to be
const
did not make a difference. -
Both
computeSpeciesEnthalpies
andComputeFluxTransportProperties
are defined to beMFEM_HOST_DEVICE
. -
For an unknown reason,
mixture->ComputePressure
works even within other objects' method. Since this is definedinline
, I also triedinline
for other methods, but it didn't work. -
Just as
const RiemannSolver *d_rsolver = rsolver_;
, I attempted to define a pointerGasMixture *d_mix = mixture;
withinComputeViscousFluxes
routine, although this did not work. Perhaps the pointer must be defined outsideMFEM_FORALL
loop, but this is not attempted yet.
It seems that the last point is the fundamental reason for the failure. @trevilo , why should these objects be defined outside MFEM_FORALL
loop? Do you think there would be any workaround for this?
On the commit 4ff9390cfeaa1a876c0bda712ee5337f72cb0afe
, GasMixture
object pointer is created outside the MFEM_FORALL
loop, and taken as an input argument of ComputeViscousFluxes
. In this branch, TransportProperties::ComputeFluxTransportProperties
is hard-coded within ComputeViscousFluxes
.
This returns a memory access fault on gpu hip path:
------------------------------------
_______ _____ _____
|__ __| __ \ / ____|
| | | |__) | (___
| | | ___/ \___ \
| | | | ____) |
|_| |_| |_____/
TPS Version: 1.1 (dev)
Git Version: e798d61
MFEM Version: MFEM v4.3 (release)
------------------------------------
Options used:
--no-version
--runFile inputs/input.dtconst.cyl.ini
--no-debug
Caching input file -> inputs/input.dtconst.cyl.ini
---------------------------------
MFEM Device configuration:
Device configuration: hip,cpu
Memory configuration: host-std,hip
---------------------------------
TPS is using non-GPU-aware MPI.
Fluid is set to the preset 'dry_air'. Input options in [plasma_models] will not be used.
number of elements on rank 0 = 6153
min elements/partition = 6153
max elements/partition = 6153
restart: 0
Initial time-step: 2.53434e-06s
[INLET]: Patch number = 1
[INLET]: Total Surface Area = 9.78349e-03
[INLET]: # of boundary faces = 133
[INLET]: # of participating MPI partitions = 1
[OUTLET]: Patch number = 2
[OUTLET]: Total Surface Area = 0.00978
[OUTLET]: # of boundary faces = 131
[OUTLET]: # of participating MPI partitions = 1
Iteration = 0: wall clock time/iter = 0.000 (secs)
Memory access fault by GPU node-1 (Agent handle: 0x1c6afc0) on address 0x1d9d000. Reason: Page not present or supervisor privilege.
It seems that there are similar issues (ROCm/tensorflow-upstream#302) with gfx803
architecture.
This issue appears to be hardware specific. We were seeing the problems on our gfx803
card, but recent testing on a gfx90a
(after unifying the _CUDA_
and _HIP_
paths in #192) doesn't show any problems. Since gfx803
is an old card that isn't fully supported anymore, I don't think we should worry about this unless we start seeing issues on newer hardware.