Why is GPU implementation significantly slower than CPU?
jinshanmu opened this issue · 1 comments
jinshanmu commented
I was trying the GPU example script:
from devito import *
import matplotlib.pyplot as plt
nx, ny = 100, 100
grid = Grid(shape=(nx, ny))
u = TimeFunction(name='u', grid=grid, space_order=2, save=200)
c = Constant(name='c')
eqn = Eq(u.dt, c * u.laplace)
step = Eq(u.forward, solve(eqn, u.forward))
xx, yy = np.meshgrid(np.linspace(0., 1., nx, dtype=np.float32),
np.linspace(0., 1., ny, dtype=np.float32))
r = (xx - .5) ** 2. + (yy - .5) ** 2.
u.data[0, np.logical_and(.05 <= r, r <= .1)] = 1.
op = Operator([step])
stats = op.apply(dt=5e-05, c=.5)
plt.rcParams['figure.figsize'] = (20, 20)
for i in range(1, 6):
plt.subplot(1, 6, i)
plt.imshow(u.data[(i - 1) * 40])
plt.show()
The CPU version op = Operator([step])
returned
Operator
Kernel
ran in 0.01 s
However, the GPU version op = Operator([step], platform='nvidiaX', opt=('advanced', {'gpu-fit': u}))
returned
Operator
Kernel
ran in 4.74 s
My CPU is Intel Xeon Gold 6133 * 80. My GPU is NVIDIA GeForce RTX 4080, with cuda 11.8 and NVIDIA HPC SDK 22.11, which works normally for other programs (e.g. PyTorch).
Any idea on what is going on here?
Thank you in advance!
jinshanmu commented
Turned to the Discussion section of devitocodes.