Fortran UNified Device Acceleration Library
- Stefano Zaghi, stefano.zaghi@cnr.it
- Giacomo Rossi, giacomo.rossi@intel.com
- Andrea di Mascio, andrea.dimascio@univaq.it
- Francesco Salvadore, f.salvadore@cineca.it
- KISS, keep it simple and stupid;
- easy handling OpenACC memory offloading on (higly parallel) accelerated devices (GPU);
- easy handling OpenMP memory offloading on (higly parallel) accelerated devices (GPU);
- MPI enabled for multi-devices clusters;
- Free, Open Source Project.
- NVIDIA HPC SDK, NVFortran: fully support OpenACC backend, works on NVIDIA GPUs, tested with v12.3;
- INTEL IFX: fully support OpenMP backend, works on INTEL GPUs, tested with v2024.0.2-20231213;
- GNU gfortran: partially support OpenACC backend, compile, but does not work with all tests, tested with v13.1.0;
| What is FUNDAL? | Status | Copyrights | A taste of FUNDAL | Documentation | Install |
OpenACC/OpenMP allows to manage (highly parallel, accelerated ) device memory by means of runtime rutines, e.g. allocate and copy to/from device. These routines, in general, handles C's pointers: FUNDAL provides a convenient fortran API to use OpenMP/OpenACC runtime routines handling C's data in background simplifying end-user experience. FUNDAL API is designed to (seamless) unify OpenACC and OpenMP runtime routines calling in order to minimize end-user's effort in developing device-offloaded applications.
Go to Top
Status of implemented API:
- device memory handling:
- dev_malloc
- OpenACC
- OpenMP
- dev_memcpy
- OpenACC
- OpenMP
- dev_memcpy_to_device
- OpenACC
- OpenMP
- dev_memcpy_from_device
- OpenACC
- OpenMP
- dev_free
- OpenACC
- OpenMP
- dev_malloc
- device handling:
- dev_get_device_num
- OpenACC
- OpenMP
- dev_get_device_type
- OpenACC
- OpenMP
- dev_get_host_num
- OpenACC
- OpenMP
- dev_get_num_devices
- OpenACC
- OpenMP
- dev_get_property_string
- OpenACC
- OpenMP
- dev_get_device_num
Go to Top
FUNDAL is an open source project, it is distributed under a multi-licensing system:
- for FOSS projects:
- for closed source/commercial projects:
Anyone is interest to use, to develop or to contribute is welcome, feel free to select the license that best matches your soul!
More details can be found on wiki.
Go to Top
A minimal example of FUNDAL usage is contained into src\examples\fundal_taste.F90
and is reported below.
program fundal_taste
use, intrinsic :: iso_fortran_env, only : I4P=>int32, R8P=>real64 ! portable kinds
use :: fundal ! FUNDAL library
implicit none
real(R8P), pointer :: a_dev(:,:,:)=>null() ! device memory
real(R8P), pointer :: b_hos(:,:,:)=>null() ! host memory
integer(I4P) :: ierr ! error status
integer(I4P) :: i, j, k ! counter
! initialize environment global variables
myhos = dev_get_host_num() ! get host ID
devtype = dev_get_device_type() ! get device type
call dev_set_device_num(0) ! set device ID (in complex scenario this ID is less trivial than 0, e.g. MPI)
mydev = dev_get_device_num() ! get device ID
! allocate device memory
call dev_alloc(fptr_dev=a_dev,lbounds=[-1,-2,-3],ubounds=[1,2,3],ierr=ierr,dev_id=mydev)
! allocate host memory
allocate(b_hos(-1:1,-2:2,-3:3))
! set host memory
b_hos = -3._R8P
! copy to device
call dev_memcpy_to_device(fptr_dst=a_dev, fptr_src=b_hos)
! work on device
!$acc parallel loop independent deviceptr(a_dev) collapse(3)
!$omp target teams distribute parallel do collapse(3) has_device_addr(a_dev)
do k=-3,3
do j=-2,2
do i=-1,1
a_dev(i,j,k) = a_dev(i,j,k) / 2._R8P
enddo
enddo
enddo
! copy from device
call dev_memcpy_from_device(fptr_dst=b_hos, fptr_src=a_dev)
! check results
print*, b_hos
endprogram fundal_taste
The device memory must be defined as pointer
while host memory can be either pointer
or allocatable
.
The memory handling (allocate, copy, free) is seamless exploiting a unified API for both OpenACC and OpenMP paradigms,
e.g. call dev_memcpy_from_device(fptr_dst=b_hos, fptr_src=a_dev)
is the unified API for memory copy from device to host
for both OpenACC and OpenMP without the necessity to write different code for the 2 backend and/or wraps snippets with
conditional preprocessing macros.
Additionaly, note that OpenACC pragmas are ignored when compiled with OpenMP without OpenACC flags (and viceversa) thus there is no need to wrap pragmas with conditional preprocessing macros.
Go to Top
FUNDAL is a module-based Fortran library and must be compiled accordingly to the modules' hierarchy.
A fobos
file is provided for easy building by means of FoBiS.py program.
Currently only NVIDIA SDK (NVFortran) and INTEL IFX compilers are supported. GNU gfortran is only partially supported.
In the following, the bare minimal information to build FUNDAL tests is reported. For a more detailed documentation on tests see tests documentation.
To build tests and examples with OpenACC backend by means of NVIDIA sdk type:
FoBiS.py build -mode fundal-test-oac-nvf
tree exe/
exe/
├── fundal_alloc_free_test
├── fundal_array_access_test
├── fundal_derived_type_memcpy_test
├── fundal_device_handling_test
├── fundal_memcpy_test
├── fundal_use_test
To build tests and examples with OpenMP backend by means of INTEL sdk type:
FoBiS.py build -mode fundal-test-omp-ifx
tree exe/
exe/
├── fundal_alloc_free_test
├── fundal_array_access_test
├── fundal_derived_type_memcpy_test
├── fundal_device_handling_test
├── fundal_memcpy_test
├── fundal_use_test
All test can be executed without any argument and a successful execution produces a test passed
output.
Test can also be executed all with a single script:
utils/run_test.sh
Moreover, the tests can be built and executed by means of FoBiS.py:
# only execution
FoBiS.py rule -ex run-tests
Executing rule "run-tests"
Command => utils/run_tests.sh
...
# build and execution with OpenACC-NVF
FoBiS.py rule -ex build-run-tests-oac-nvf
Executing rule "build-run-tests-oac-nvf"
Command => FoBiS.py clean
Command => FoBiS.py build -mode fundal-test-oac-nvf
Command => FoBiS.py rule -ex run-tests
...
# build and execution with OpenMP-IFX
FoBiS.py rule -ex build-run-tests-omp-ifx
Executing rule "build-run-tests-omp-ifx"
Command => FoBiS.py clean
Command => FoBiS.py build -mode fundal-test-omp-ifx
Command => FoBiS.py rule -ex run-tests
...
Go to Top
In the following, the API of each FUNDAL routine is documented in details with also examples.
- Device memory handling
- Device handling
Runtime routines to handle memory device.
The dev_malloc allocates space in the current device memory. The signature is:
subroutine dev_alloc(fptr_dev, ubounds, ierr, dev_id, lbounds, init_value)
real/integer, intent(out), pointer :: fptr_dev(..) !< Pointer to allocated memory.
integer(I4P), intent(in) :: ubounds(:) !< Array upper bounds.
integer(I4P), intent(out) :: ierr !< Error status.
fptr_dev
is a pointer array of any ranks up to 7 of real (kinds R8P, R4P) or integer (kinds I8P, I4P, I1P).
ubounds
is an integer array of rank 1 containing the upper bounds of fptr_dev
.
ierr
returns the error status of allocation, it is 0 for a successful allocation.
integer(I4P), intent(in), optional :: dev_id !< Device ID.
integer(I4P), intent(in), optional :: lbounds(:) !< Array lower bounds, 1 if not passed.
real/integer, intent(in), optional :: init_value !< Optional initial value.
dev_id
is the device num (ID) over the allocation happens. For OpenACC it is not used. For OpenMP is set to the environmental global
variable mydev
(that must be previously initialized by means of dev_get_device_num
) if it is not passed.
lbounds
is an integer array of rank 1 containing the lower bounds of fptr_dev
. It is set to 1 if it is not passed.
init_value
is a real/integer scalar (of the same kind of fptr_dev
): if it is passed it is used to initialized fptr_dev
with a parallel device loop.
dev_alloc
usage example
use :: fundal
...
real(R8P), pointer :: a(:,:,:)
integer(I4P) :: ierr
...
call dev_alloc(fptr_dev=a,lbounds=[-1,-2,-3],ubounds=[1,2,3],init_value=1._R8P,ierr=ierr)
...
The dev_memcpy_from_device
copies data from device memory to local host memory.
subroutine dev_memcpy_from_device(fptr_dst, fptr_src)
real/integer, intent(out), target :: fptr_dst(:) !< Destination memory (host memory).
real/integer, intent(in), target :: fptr_src(:) !< Source memory (device memory).
fptr_dst
is a target, host memory, array of any ranks up to 7 of real (kinds R8P, R4P) or integer (kinds I8P, I4P, I1P).
fptr_src
is a target, device memory, array of any ranks up to 7 of real (kinds R8P, R4P) or integer (kinds I8P, I4P, I1P).
dev_memcpy_from_device
usage example
use :: fundal
...
real(R8P), pointer :: a(:,:,:)
real(R8P), allocatable :: b(:,:,:)
...
call dev_memcpy_from_device(fptr_dst=b, fptr_src=a)
...
The dev_memcpy_to_device
copies data from local host memory to device memory.
subroutine dev_memcpy_to_device(fptr_dst, fptr_src)
real/integer, intent(out), target :: fptr_dst(:) !< Destination memory (device memory).
real/integer, intent(in), target :: fptr_src(:) !< Source memory (host memory).
fptr_dst
is a target, device memory, array of any ranks up to 7 of real (kinds R8P, R4P) or integer (kinds I8P, I4P, I1P).
fptr_src
is a target, host memory, array of any ranks up to 7 of real (kinds R8P, R4P) or integer (kinds I8P, I4P, I1P).
dev_memcpy_to_device
usage example
use :: fundal
...
real(R8P), pointer :: a(:,:,:)
real(R8P), allocatable :: b(:,:,:)
...
call dev_memcpy_to_device(fptr_dst=a, fptr_src=b)
...
The dev_free
frees memory on the current device.
subroutine dev_free(fptr, dev_id)
real/integer, intent(out), pointer :: fptr_dev(..) !< Pointer to allocated memory.
fptr_dev
is a pointer array of any ranks up to 7 of real (kinds R8P, R4P) or integer (kinds I8P, I4P, I1P).
integer(I4P), intent(in), optional :: dev_id !< Device ID.
dev_id
is the device num (ID) over the allocation happens. For OpenACC it is not used. For OpenMP is set to the environmental global
variable mydev
(that must be previously initialized by means of dev_get_device_num
) if it is not passed.
dev_free
usage example
use :: fundal
...
real(R8P), pointer :: a(:,:,:)
...
call dev_free(fptr_dev=a)
...
Runtime routines to handle device(s), in particular for complex scenario like MPI programming.
To be written.
To be written.
To be written.
To be written.
To be written.
To be written.
Go to Top