ValeevGroup/tiledarray

Integration with Umpire 6.0.0

victor-anisimov opened this issue · 5 comments

I'm trying to replace old Umpire (ver 1.0) in TA with a newer one (ver 6.0.0), which includes SYCL/HIP/CUDA support for GPU, and I'm running into compilation/configuration issues when compiling for CUDA platform, V100. I'm using Intel icx/icpx compilers, which are basically clang equivalents. The TA code including old Umpire compiles for CUDA by using these compilers without problem. Making TA to compile Umpire 6.0.0 is where I run into compilation/configuration issues.

Umpire 6.0.0 requires a dozen of dependencies to be separately downloaded from github. This can be done with
git submodule init
git submodule update
manually issued in Umpire directory after Umpire is downloaded.

I'm trying to instruct TA/cmake to download those dependencies.

The script umpire.cmake that comes with TA includes a line
GIT_SUBMODULES "" # N.B. do not initialize modules!
in ExternalProject_Add(Umpire ...) portion of the umpire.cmake script.
With that in place, the downloaded Umpire code does not compile since parts of the code (dependencies) are missing.

To tell TA/cmake that they need to download Umpire dependencies, I modified the line
GIT_SUBMODULES "." # N.B. do not initialize modules!
so that TA/cmake will initialize the dependencies.

In the "original" case when I have GIT_SUBMODULES "" # N.B. do not initialize modules! TA/cmake autogenerates
set(init_submodules FALSE)
if(init_submodules)
execute_process(
COMMAND "/usr/bin/git" submodule update --recursive --init
WORKING_DIRECTORY "/home/vanisimov/tiledarray/tiledarray/build.gpu.icx/external/source/Umpire"
RESULT_VARIABLE error_code
)
endif()
in build/external/tmp, which does not download dependencies, so the follow up compilation fails.

In the "enabled" case when I have GIT_SUBMODULES "." # N.B. do not initialize modules! TA/cmake autogenerates
set(init_submodules TRUE)
if(init_submodules)
execute_process(
COMMAND "/usr/bin/git" submodule update --recursive --init .
WORKING_DIRECTORY "/home/vanisimov/tiledarray/tiledarray/build.gpu.icx/external/source/Umpire"
RESULT_VARIABLE error_code
)
endif()
which forces cmake to download dependencies. Indeed, inspecting the Umpire directory after executing make in directory build/ I see that the code in build/external/source/Umpire is complete.

I see the advice in the code # N.B. do not initialize modules! but do not know what that means.
Heading that advice it would be best to issue git clone --recursive instead of usual git clone from cmake so that to avoid the use of submodules but I do not know how to do it from ExternalProject_Add(). Any suggestions?

What additionally confuses me is that the section "submodule update" is present in both autogenerated scripts, Umpire-gitclone.cmake and Umpire-gitupdate.cmake in build/extenal/tmp/

To be sure, I checked the line git submodule update --recursive --init . with manual compilation of Umpire 6.0.0, and it worked, and all tests passed. With that, I'm sure that the above line is identical to separately issued git submodule init and git submodule update. Using git clone --recursive instead of the usual git clone is much easier, though.

Although as we see it now that make properly downloads all parts of Umpire, it fails in the Umpire configuration step with an error message:

CMake Error at cmake/SetupUmpireThirdParty.cmake:66 (get_target_property):
get_target_property() called with non-existent target "cuda".
Call Stack (most recent call first):
CMakeLists.txt:157 (include)

CMake Error at cmake/SetupUmpireThirdParty.cmake:66 (get_target_property):
get_target_property() called with non-existent target "cuda_runtime".
Call Stack (most recent call first):
CMakeLists.txt:157 (include)

-- Host Shared Memory Disabled
-- Configuring incomplete, errors occurred!
See also "/home/vanisimov/tiledarray/tiledarray/build.gpu.icx/external/build/Umpire/CMakeFiles/CMakeOutput.log".
See also "/home/vanisimov/tiledarray/tiledarray/build.gpu.icx/external/build/Umpire/CMakeFiles/CMakeError.log".
CMakeOutput.log
CMakeError.log

I do not see anything in cmake log files clarifying the error message. Any help or suggestion would be greatly appreciated!

I got TA/cmake performing recursive cloning of Umpire by simply commenting out
#GIT_SUBMODULES "" # N.B. do not initialize modules! in umpire.cmake

Next to deal with is the cmake error with non-existing targets:
CMake Error at cmake/SetupUmpireThirdParty.cmake:66 (get_target_property):
get_target_property() called with non-existent target "cuda".

CMake Error at cmake/SetupUmpireThirdParty.cmake:66 (get_target_property):
get_target_property() called with non-existent target "cuda_runtime".

The error comes from build/external/source/Umpire/cmake/SetupUmpireThirdParty.cmake, which includes:
set(TPL_DEPS)
blt_list_append(TO TPL_DEPS ELEMENTS cuda cuda_runtime IF ENABLE_CUDA)
blt_list_append(TO TPL_DEPS ELEMENTS hip hip_runtime IF ENABLE_HIP)
blt_list_append(TO TPL_DEPS ELEMENTS openmp IF ENABLE_OPENMP)
blt_list_append(TO TPL_DEPS ELEMENTS mpi IF ENABLE_MPI)

foreach(dep ${TPL_DEPS})
# If the target is EXPORTABLE, add it to the export set
get_target_property(_is_imported ${dep} IMPORTED)
if(NOT ${_is_imported})
install(TARGETS ${dep}
EXPORT umpire-targets
DESTINATION lib)
# Namespace target to avoid conflicts
set_target_properties(${dep} PROPERTIES EXPORT_NAME umpire::${dep})
endif()
endforeach()

If I comment out the section of foreach the error goes away.

Is there a more proper way to deal with the error coming from get_target_property() ?

Got TA 48eac1d compiled with Umpire 5.0.1 and Umpire 6.0.0. Figured out that I need to use BUILDTYPE=RelWithDebInfo in order to build TA unit tests. With both version of Umpire, the number of unit tests failing on V100 changes from a run to run. Cannot use such non-reproducible tests to validate TA integration with Umpire for CUDA and SYCL platforms. Please suggest how to test Umpire with TA.

In tiledarray/src/TiledArray/external/umpire.h
one need to replace
#include <umpire/strategy/DynamicPool.hpp>
with
#include <umpire/strategy/QuickPool.hpp>

I retested the Umpire 5.0.1 and 6.0.0 own unit tests (without TiledArray). All actually successfully complete on nVidia V100. The real problem with Umpire 5.0.1 is that it does not compile on SYCL platform. It throws a compiler error. This error is fixed in Umpire 6.0.0, which successfully compiles on SYCL platform.

@victor-anisimov I created #311 to bump umpire to v6. It did not notice the submodule issues your referred to at the beginning ... I suspect these were resolved via 7c6e354

I'll have a look at the CUDA tests again