PhysicsofFluids/AFiD

Faster Implicit update

jdonners opened this issue · 1 comments

Massimiliano Fatica (NVIDIA) wrote:

While I was adding the GPU path to the Implicit update routines, I found a good improvement (2x-3x) for the CPU code with a better use of the dgttrs call.

Basically, after the dgttrf call, instead of solving each vertical line:

       do ic=xstart(3),xend(3)
       do jc=xstart(2),xend(2)

!     Normalize RHS of equation

        fkl(1)= real(0.,fp_kind)
        do kc=2,nxm
         ackl_b=real(1.0,fp_kind)/(real(1.0,fp_kind)-ac3ssk(kc)*betadx)
         fkl(kc)=rhs(kc,jc,ic)*ackl_b
        end do
        fkl(nx)= real(0.,fp_kind)

!     Solve equation using LAPACK library

        call dgttrs('N',nx,1,amkT,ackT,apkT,appk,ipkv,fkl,nx,info)

!      Update temperature field

        do kc=2,nxm
          temp(kc,jc,ic) = temp(kc,jc,ic) + fkl(kc)
        end do

       enddo
      end do

you can solve all of them together

       nrhs=(xend(3)-xstart(3)+1)*(xend(2)-xstart(2)+1)
! Normalize RHS (but this should be moved in the main loop of the corresponding ImplicitUpdate
       do ic=xstart(3),xend(3)
         do jc=xstart(2),xend(2)
            do kc=2,nxm
              ackl_b=real(1.0,fp_kind)/(real(1.0,fp_kind)-ac3ssk(kc)*betadx)
              rhs(kc,jc,ic)=rhs(kc,jc,ic)*ackl_b
             end do
          end do
      end do

      call dgttrs('N',nx,nrhs,amkT,ackT,apkT,appk,ipkv,rhs,nx,info)

! You can also add OpenMP directives on these loops
       do ic=xstart(3),xend(3)
         do jc=xstart(2),xend(2)
            do kc=2,nxm
              temp(kc,jc,ic)=temp(kc,jc,ic) + rhs(kc,jc,ic)
             end do
          end do
      end do

Committed in commit 294.