afft
is a C/C++ library for FFT related computations. It provides unified interface to various implementations of transforms in C and C++ . The main goals are:
- user friendly interface,
- support for wide range of the features offered by the backend libraries,
- low overhead and
- being multiplatform (
Linux
,Windows
andMacOS
).
Currently supported transfors are:
- Discrete Fourier Transform (DFT) for real and complex inputs (in interleaved or plannar format),
- Discrete Hartley Transform (DHT) of real data and
- Discrete Trigonomic Transform (DTT) of types DCT (1-4) or DST (1-4).
A transform may be executed in-place or out-of-place over multidimensional strided arrays in various precision. The created plans can be stored in a LRU plan cache.
The library supports execution in any floating point precision on CPU
, CUDA
, HIP
and OpenCL
targets and may be distributed over multiple targets or processes (via e. g. MPI).
The transformations are implemented by the backend libraries. Currently, the library supports clFFT
, cuFFT
, FFTW3
, HeFFTe
, hipFFT
, Intel MKL
, PocketFFT
, rocFFT
and VkFFT
. More backend libraries shall be added in the future.
The library can be used as a header only library (C++17 onwards), static/dynamic library (C99/C++17 onwards) or a C++ module (C++20 onwards and CMake 3.28 onwards).
Prerequisities with '*' are optional.
- CMake 3.20 or newer
- C/C++ compiler supporting C++17
- multi-process backend libraries
MPI
*
- target frameworks
CPU
target is always enabledCUDA
* target requires CUDA ToolkitHIP
* target requires ROCm, for NVIDIA GPUs install CUDA Toolkit as wellOpenCL
* target requires OpenCL package
- backend libraries (optional)
clFFT
*cuFFT
* comes with CUDA Toolkit, if you want to use multi-process version, NVHPC Toolkit is requiredFFTW3
*HeFFTe
*hipFFT
* is part of HIP, requiresrocFFT
(AMD GPUs) andcuFFT
(NVIDIA GPUs)Intel MKL
* is part of Intel MKL libraryPocketFFT
is included in this projectrocFFT
* comes with ROCm packageVkFFT
is included in this project, supports CUDA, HIP and OpenCL targets
This library is available under MIT license. See LICENSE
for details.
#include <array>
#include <chrono>
#include <complex>
#include <vector>
#include <afft/afft.hpp>
// PrecT is the precision type of the transform
using PrecT = float;
// alias for std::vector with aligned allocator
template<typename T>
using AlignedVector = std::vector<T, afft::AlignedAllocator<T>>;
// shape of the transform
constexpr std::array<afft::Size, 3> shape{500, 250, 1020};
// padded source shape
constexpr std::array<afft::Size, 3> srcPaddedShape{500, 250, 1024};
// padded destination shape
constexpr std::array<afft::Size, 3> dstPaddedShape{500, 1020, 256};
// order of the axes in the destination shape
constexpr std::array<afft::Axis, 3> dstAxesOrder{0, 2, 1};
// alignment of the memory
constexpr afft::Alignment alignment = afft::Alignment::cpuNative;
int main()
{
// make DFT parameters
afft::dft::Parameters dftParams{};
dftParams.direction = afft::Direction::forward;
dftParams.precision = afft::makePrecision<PrecT>(); // use same precision for source, destination and execution
dftParams.shape = shape;
dftParams.axes = {{1}};
dftParams.normalization = afft::Normalization::none;
dftParams.placement = afft::Placement::outOfPlace;
dftParams.type = afft::dft::Type::complexToComplex;
// make CPU parameters
afft::cpu::Parameters cpuParams{};
cpuParams.threadLimit = 4; // limit the number of threads to 4
// make strides for the source and destination shapes
const auto srcStrides = afft::makeStrides(afft::View<afft::Size, 3>{srcPaddedShape});
const auto dstStrides = afft::makeTransposedStrides(afft::View<afft::Size, 3>{dstPaddedShape},
afft::View<afft::Axis, 3>{dstAxesOrder});
// make memory layout
afft::CentralizedMemoryLayout memoryLayout{};
memoryLayout.alignment = alignment;
memoryLayout.complexFormat = afft::ComplexFormat::interleaved; // std::complex uses interleaved format
memoryLayout.srcStrides = srcStrides;
memoryLayout.dstStrides = dstStrides;
// make backend parameters
afft::cpu::BackendParameters backendParams{};
backendParams.strategy = afft::SelectStrategy::first;
backendParams.mask = (afft::BackendMask::fftw3 | afft::BackendMask::mkl | afft::BackendMask::pocketfft);
backendParams.order = {{afft::Backend::mkl, afft::Backend::fftw3}};
backendParams.fftw3.plannerFlag = afft::fftw3::PlannerFlag::measure; // FFTW3 specific planner flag
backendParams.fftw3.timeLimit = std::chrono::seconds{2}; // limit the time for the FFTW3 planner
// make the plan with the parameters
std::unique_ptr<afft::Plan> plan = afft::makePlan(dftParams, cpuParams, memoryLayout, backendParams);
// create source and destination vectors
AlignedVector<std::complex<PrecT>> src(plan->getSrcElemCounts().front()); // source vector
AlignedVector<std::complex<PrecT>> dst(plan->getDstElemCounts().front()); // destination vector
// initialize source vector
// execute the transform
plan->execute(src.data(), dst.data());
// use the result in the destination vector
}