GlobalArrays/ga

Can we add the TARGET attribute to the arrays in mafdecls.fh?

Closed this issue · 5 comments

Adding the TARGET attribute to arrays such as INT_MB and DBL_MB allows Fortran programmers to direct pointers at segments of those arrays. This can be used to access the MA memory just like you would normal Fortran arrays. To me this allows for more concise and clearer code than doing explicit offset calculations. See the sample code (and modified mafdecls.fh) in
target_pr.zip.

This may have a severe negative impact on performance if it turns off compiler optimizations. Please provide NWChem benchmarks with Intel, PGI, IBM XL, and GCC Fortran compilers to show that this has zero impact on performance.

Honestly, I see no reason to improve MA. NWChem needs to deprecate that API and move to modern Fortran using allocate. That is the only reasonable thing to do if you want to use Fortran pointers.

Note that the motivation for MA is largely gone, because Fortran 95+ compilers are widely available and ARMCI is far less dependent on registered memory slabs than it used to be. The ARMCI back-end that benefits from MA_USES_ARMCI_MEM is Mellanox InfiniBand but recent hardware supports ODP.

What would that benchmark look like? I can compile a regular version of NWChem, and then I can compile a version of NWChem where I have hacked mafdecls.fh. Run a bunch of calculations with both and see if there is a performance difference. Is that the benchmark you were thinking of?

Another benchmark would be to convert a piece of code using pointers instead of directly accessing dbl_mb and compare the performance of the two implementations. This is much closer what you really would want to know, but this takes a non-trivial amount of work (unless we can identify a single subroutine that takes a substantial amount of time and would be very sensitive to the performance difference resulting from this change). If you have any suggestions I could try that.

Otherwise, I agree that the MA business is way past its sell-by-date. Allocatable arrays would be much more convenient, and current Fortran programmers would understand what is happening. However, there are many lines of code dealing with these memory arrays, and I am not sure where we are going to get the effort from to replace all of that.

What would that benchmark look like? I can compile a regular version of NWChem, and then I can compile a version of NWChem where I have hacked mafdecls.fh. Run a bunch of calculations with both and see if there is a performance difference. Is that the benchmark you were thinking of?

Yes.

It would also be useful to see compiler diagnostics (e.g. ifort -qopt-report=5) to be sure that TARGET doesn't disable vectorization and other transformations.

Another benchmark would be to convert a piece of code using pointers instead of directly accessing dbl_mb and compare the performance of the two implementations. This is much closer what you really would want to know, but this takes a non-trivial amount of work (unless we can identify a single subroutine that takes a substantial amount of time and would be very sensitive to the performance difference resulting from this change). If you have any suggestions I could try that.

I agree this is more work. If we see no change in the previous experiments, I guess we can skip it. However, if we see perturbations in the NWChem benchmarks, we need to find the root cause somehow.

Otherwise, I agree that the MA business is way past its sell-by-date. Allocatable arrays would be much more convenient, and current Fortran programmers would understand what is happening. However, there are many lines of code dealing with these memory arrays, and I am not sure where we are going to get the effort from to replace all of that.

If you need TARGET, that means you are writing new code with Fortran pointers. I don't understand why ALLOCATE and DEALLOCATE are a burden for new code. It hasn't been anywhere close to a bottleneck when modernizing existing NWChem CC codes.

Correct, TARGET would be used with new code I am writing. At the moment it uses ALLOCATE and DEALLOCATE as it is experimental anyway. The TARGET trick would be a way to port that code back to using MA (once it is all finished). Clearly, if ALLOCATE and DEALLOCATE are acceptable now that saves me a load of hassle downstream, and we don't even need to think about any of this. That resolution is perfect from my point of view.

Actually, it is not necessary to add TARGET to mafdecls.fh. One can just add TARGET to the code where you want to use pointers:

#include "mafdecls.fh"
      target dbl_mb
      double precision, pointer :: array(:,:)
      array(1:10,1:10)=>dbl_mb(ioffset:ioffset+100-1)

This works with gfortran 7.2.0, and the GNU compilers are usually reasonably picky.