DejvBayer/afft

C/C++ wrapper library for fft-related computations on various targets

C++MIT

afft library

afft is a C/C++ library for FFT related computations. It provides unified interface to various implementations of transforms in C and C++ . The main goals are:

user friendly interface,
support for wide range of the features offered by the backend libraries,
low overhead and
being multiplatform (Linux, Windows and MacOS).

Currently supported transfors are:

Discrete Fourier Transform (DFT) for real and complex inputs (in interleaved or plannar format),
Discrete Hartley Transform (DHT) of real data and
Discrete Trigonomic Transform (DTT) of types DCT (1-4) or DST (1-4).

A transform may be executed in-place or out-of-place over multidimensional strided arrays in various precision. The created plans can be stored in a LRU plan cache.

The library supports execution in any floating point precision on CPU, CUDA, HIP and OpenCL targets and may be distributed over multiple targets or processes (via e. g. MPI).

The transformations are implemented by the backend libraries. Currently, the library supports clFFT, cuFFT, FFTW3, HeFFTe, hipFFT, Intel MKL, PocketFFT, rocFFT and VkFFT. More backend libraries shall be added in the future.

⚠️ Take into account that not all of the afft functionality is supported by each transform backend.

Build

The library can be used as a header only library (C++17 onwards), static/dynamic library (C99/C++17 onwards) or a C++ module (C++20 onwards and CMake 3.28 onwards).

Prerequisities

Prerequisities with '*' are optional.

CMake 3.20 or newer
C/C++ compiler supporting C++17
multi-process backend libraries
- MPI*
target frameworks
- CPU target is always enabled
- CUDA* target requires CUDA Toolkit
- HIP* target requires ROCm, for NVIDIA GPUs install CUDA Toolkit as well
- OpenCL* target requires OpenCL package
backend libraries (optional)
- clFFT*
- cuFFT* comes with CUDA Toolkit, if you want to use multi-process version, NVHPC Toolkit is required
- FFTW3*
- HeFFTe*
- hipFFT* is part of HIP, requires rocFFT (AMD GPUs) and cuFFT (NVIDIA GPUs)
- Intel MKL* is part of Intel MKL library
- PocketFFT is included in this project
- rocFFT* comes with ROCm package
- VkFFT is included in this project, supports CUDA, HIP and OpenCL targets

License

This library is available under MIT license. See LICENSE for details.

Examples

Simple 1D complex-to-complex transform in Y axis of 3D padded data

#include <array>
#include <chrono>
#include <complex>
#include <vector>

#include <afft/afft.hpp>

// PrecT is the precision type of the transform
using PrecT = float;

// alias for std::vector with aligned allocator
template<typename T>
using AlignedVector = std::vector<T, afft::AlignedAllocator<T>>;

// shape of the transform
constexpr std::array<afft::Size, 3> shape{500, 250, 1020};

// padded source shape
constexpr std::array<afft::Size, 3> srcPaddedShape{500, 250, 1024};

// padded destination shape
constexpr std::array<afft::Size, 3> dstPaddedShape{500, 1020, 256};

// order of the axes in the destination shape
constexpr std::array<afft::Axis, 3> dstAxesOrder{0, 2, 1};

// alignment of the memory
constexpr afft::Alignment alignment = afft::Alignment::cpuNative;

int main()
{
  // make DFT parameters
  afft::dft::Parameters dftParams{};
  dftParams.direction     = afft::Direction::forward;
  dftParams.precision     = afft::makePrecision<PrecT>(); // use same precision for source, destination and execution
  dftParams.shape         = shape;
  dftParams.axes          = {{1}};
  dftParams.normalization = afft::Normalization::none;
  dftParams.placement     = afft::Placement::outOfPlace;
  dftParams.type          = afft::dft::Type::complexToComplex;

  // make CPU parameters
  afft::cpu::Parameters cpuParams{};
  cpuParams.threadLimit = 4; // limit the number of threads to 4

  // make strides for the source and destination shapes
  const auto srcStrides = afft::makeStrides(afft::View<afft::Size, 3>{srcPaddedShape});
  const auto dstStrides = afft::makeTransposedStrides(afft::View<afft::Size, 3>{dstPaddedShape},
                                                      afft::View<afft::Axis, 3>{dstAxesOrder});

  // make memory layout
  afft::CentralizedMemoryLayout memoryLayout{};
  memoryLayout.alignment     = alignment;
  memoryLayout.complexFormat = afft::ComplexFormat::interleaved; // std::complex uses interleaved format
  memoryLayout.srcStrides    = srcStrides;
  memoryLayout.dstStrides    = dstStrides;

  // make backend parameters
  afft::cpu::BackendParameters backendParams{};
  backendParams.strategy          = afft::SelectStrategy::first;
  backendParams.mask              = (afft::BackendMask::fftw3 | afft::BackendMask::mkl | afft::BackendMask::pocketfft);
  backendParams.order             = {{afft::Backend::mkl, afft::Backend::fftw3}};
  backendParams.fftw3.plannerFlag = afft::fftw3::PlannerFlag::measure; // FFTW3 specific planner flag
  backendParams.fftw3.timeLimit   = std::chrono::seconds{2}; // limit the time for the FFTW3 planner

  // make the plan with the parameters
  std::unique_ptr<afft::Plan> plan = afft::makePlan(dftParams, cpuParams, memoryLayout, backendParams);

  // create source and destination vectors
  AlignedVector<std::complex<PrecT>> src(plan->getSrcElemCounts().front()); // source vector
  AlignedVector<std::complex<PrecT>> dst(plan->getDstElemCounts().front()); // destination vector

  // initialize source vector

  // execute the transform
  plan->execute(src.data(), dst.data());

  // use the result in the destination vector
}