NOAA-GFDL/MOM6

make -j in ac/deps Error: Can't open included file 'mpif-config.h' [Makefile.dep:157: mpp_data.o] Error 1 in cluster env with modules

Closed this issue · 5 comments

Using RHEL 8 in a cluster where we can load modules ad hoc however I am getting the below errors from make -j in ~/MOM/ac/deps

 make -j
make -C fms/build libFMS.a
make[1]: Entering directory '/path/to/MOM6/ac/deps/fms/build'
gfortran -DPACKAGE_NAME=\"FMS\" -DPACKAGE_TARNAME=\"fms\" -DPACKAGE_VERSION=\"\ \" -DPACKAGE_STRING=\"FMS\ \ \" -DPACKAGE_BUGREPORT=\"https://github.com/NOAA-GFDL/FMS/issues\" -DPACKAGE_URL=\"\" -DHAVE_SCHED_GETAFFINITY=1 -Duse_libMPI=1 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_NETCDF_H=1 -Duse_netCDF=1 -g -O2 -I/burg/opt/netcdf-fortran-4.5.3/include -fcray-pointer -fdefault-real-8 -fdefault-double-8 -ffree-line-length-none   -I/burg/opt/netcdf-c-4.7.4/include -c ../src/mpp/mpp_data.F90 -I../src/include -I../src/mpp/include
/usr/mpi/gcc/openmpi-4.0.3rc4/include/mpif.h:56: Error: Can't open included file 'mpif-config.h'
make[1]: *** [Makefile.dep:157: mpp_data.o] Error 1
make[1]: Leaving directory '/path/to/MOM6/ac/deps/fms/build'
make: *** [Makefile:46: fms/build/libFMS.a] Error 2

mpif-config.h is definitely in /usr/mpi/gcc/openmpi-4.0.3rc4/include/mpif-config.h

Is there an environment variable I can use to get around this?

I am guessing that the AX_MPI macro could not figure out your MPI compiler. You could try something like this

$ CC=mpicc FC=mpifort make -j

although you might have more problems down the road.

What kind of system are you on? How would you normally compiler an MPI program?

I managed to work around this with a couple of different modules being loaded:
netcdf-fortran/4.5.3 & netcdf/4.7.4

Those have the following paths:
/path/to/netcdf-c-4.7.4/bin:/path/to/netcdf-fortran-4.5.3/bin

I also got:

checking for netcdf.h... no
checking for nc-config... no
configure: error: Could not find nc-config.
make: *** [Makefile:52: fms/build/Makefile] Error 1

I found that in: /path/to/netcdf-c-4.7.4/bin/nc-config

What kind of system are you on? How would you normally compiler an MPI program?

RHEL 8 using Bright Computing and we would use whichever version of OpenMPI, Parallel Studio or mpich that is needed so I loaded openmpi/4.0.3rc4 which gives us mpicc.

I also had a separate error running ../ac/configure in ~/build:

config.status: executing Makefile.dep commands
../ac/../ac/makedep -o Makefile.dep -e ../ac/../src ../ac/../config_src/infra/FMS1 ../ac/../config_src/external ../ac/../config_src/drivers/solo_driver ../ac/../config_src/memory/dynamic_symmetric
Traceback (most recent call last):
  File "../ac/../ac/makedep", line 301, in <module>
    create_deps(args.path, args.makefile, args.debug, args.exec_target, args.fc_rule, args.link_externals, sys.argv[0])
  File "../ac/../ac/makedep", line 52, in create_deps
    mods, used, cpp, inc, prg = scan_fortran_file( f )
  File "../ac/../ac/makedep", line 237, in scan_fortran_file
    lines = file.readlines()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1798: ordinal not in range(128)
make: *** No rule to make target 'Makefile.dep', needed by 'depend'.  Stop.

So to get around that I used a newer version of Python from anaconda/3-2020.11, 3.8.5 and voila:

ls -l MOM6
-rwxr-xr-x 1 me user 35230464 Nov 16 15:07 MOM6

I'll close this but perhaps I've provided some clues as to what could improve other users' experience?

Are you saying that changing your netCDF modules somehow fixed an MPI issue? Or was that a separate unrelated issue?

I'd like to understand the following a bit better:

  • The AX_MPI macro should have detected and killed this run before attempting to build it. I don't believe these macros explciitly check for mpif-config.h which might be part of the reason. But it looks like you were able to do something to fix it?

  • Unicode support in makedep ought to be something we can fix in older Python versions. Do you know which file failed? (If not, then I might be able to figure it out based on the error.)

Are you saying that changing your netCDF modules somehow fixed an MPI issue? Or was that a separate unrelated issue?
I'd like to understand the following a bit better:

  • The AX_MPI macro should have detected and killed this run before attempting to build it. I don't believe these macros explciitly check for mpif-config.h which might be part of the reason. But it looks like you were able to do something to fix it?

By loading these 3 modules: netcdf-fortran/4.5.3, netcdf/4.7.4 & openmpi/4.0.3rc4, the errors went away.

  • Unicode support in makedep ought to be something we can fix in older Python versions. Do you know which file failed? (If not, then I might be able to figure it out based on the error.)

Not really I can provide where the error happens by pasting from the configure attempt earlier:

checking for gcc option to accept ISO C89... none needed
checking for mpicc... mpicc
checking for mpi.h... yes
checking size of jmp_buf... 200
checking size of sigjmp_buf... 200
configure: creating ./config.status
config.status: creating Makefile
config.status: executing Makefile.dep commands
../ac/../ac/makedep -o Makefile.dep -e ../ac/../src ../ac/../config_src/infra/FMS1 ../ac/../config_src/external ../ac/../config_src/drivers/solo_driver ../ac/../config_src/memory/dynamic_symmetric

By loading these 3 modules: netcdf-fortran/4.5.3, netcdf/4.7.4 & openmpi/4.0.3rc4, the errors went away.

I see, that makes more sense. Perhaps some MPI elements were visible before openmpi/4.0.3rc4 was loaded but not mpif-config.h. The AX_MPI macro is the only one that I took from the autoconf archive, so maybe there is some issue in there. I will look into that.

The netcdf macros appear to have worked correctly, with the missing libraries/headers detected and the environment modules resolving the problem.

The unicode issue feels like something we should fix though, which I will look into.

Either way, thanks very much for this feedback, this is all invaluable and the kind that we don't often get to hear about.