Faster Implicit update
jdonners opened this issue · 1 comments
jdonners commented
Massimiliano Fatica (NVIDIA) wrote:
While I was adding the GPU path to the Implicit update routines, I found a good improvement (2x-3x) for the CPU code with a better use of the dgttrs call.
Basically, after the dgttrf call, instead of solving each vertical line:
do ic=xstart(3),xend(3)
do jc=xstart(2),xend(2)
! Normalize RHS of equation
fkl(1)= real(0.,fp_kind)
do kc=2,nxm
ackl_b=real(1.0,fp_kind)/(real(1.0,fp_kind)-ac3ssk(kc)*betadx)
fkl(kc)=rhs(kc,jc,ic)*ackl_b
end do
fkl(nx)= real(0.,fp_kind)
! Solve equation using LAPACK library
call dgttrs('N',nx,1,amkT,ackT,apkT,appk,ipkv,fkl,nx,info)
! Update temperature field
do kc=2,nxm
temp(kc,jc,ic) = temp(kc,jc,ic) + fkl(kc)
end do
enddo
end do
you can solve all of them together
nrhs=(xend(3)-xstart(3)+1)*(xend(2)-xstart(2)+1)
! Normalize RHS (but this should be moved in the main loop of the corresponding ImplicitUpdate
do ic=xstart(3),xend(3)
do jc=xstart(2),xend(2)
do kc=2,nxm
ackl_b=real(1.0,fp_kind)/(real(1.0,fp_kind)-ac3ssk(kc)*betadx)
rhs(kc,jc,ic)=rhs(kc,jc,ic)*ackl_b
end do
end do
end do
call dgttrs('N',nx,nrhs,amkT,ackT,apkT,appk,ipkv,rhs,nx,info)
! You can also add OpenMP directives on these loops
do ic=xstart(3),xend(3)
do jc=xstart(2),xend(2)
do kc=2,nxm
temp(kc,jc,ic)=temp(kc,jc,ic) + rhs(kc,jc,ic)
end do
end do
end do
stevensrjam commented
Committed in commit 294.