/ducc

Fork of https://gitlab.mpcdf.mpg.de/mtr/ducc to simplify external contributions

Primary LanguageC++GNU General Public License v2.0GPL-2.0

Distinctly Useful Code Collection (DUCC)

This is a collection of basic programming tools for numerical computation, including Fast Fourier Transforms, Spherical Harmonic Transforms, non-equispaced Fourier transforms, as well as some concrete applications like 4pi convolution on the sphere and gridding/degridding of radio interferometry data.

The code is written in C++17, but provides a simple and comprehensive Python interface.

Requirements

  • Python >= 3.8
  • only when compiling from source: pybind11
  • only when compiling from source: a C++17-capable compiler, e.g.
    • g++ 7 or later
    • clang++
    • MSVC 2019 or later
    • Intel icpx (oneAPI compiler series). (Note that the older icpc compilers are not supported.)

Sources

The latest version of DUCC can be obtained by cloning the repository via

git clone https://gitlab.mpcdf.mpg.de/mtr/ducc.git

Licensing terms

  • All source code in this package is released under the terms of the GNU General Public License v2 or later.
  • Some files (those constituting the FFT component and its internal dependencies) are also licensed under the 3-clause BSD license. These files contain two sets of licensing headers; the user is free to choose under which of those terms they want to use these sources.

Documentation

Online documentation of the most recent Python interface is available at https://mtr.pages.mpcdf.de/ducc.

The C++ interface is documented at https://mtr.pages.mpcdf.de/ducc/cpp. Please note that this interface is not as well documented as the Python one, and that it should not be considered stable.

Installation

For best performance, it is recommended to compile DUCC from source, optimizing for the specific CPU on the system. This can be done using the command

pip3 install --no-binary ducc0 --user ducc0

NOTE: compilation requires the appropriate compilers to be installed (see above) and can take a significant amount of time (several minutes).

Alternatively, a simple

pip3 install --user ducc0

will install a pre-compiled binary package, which makes the installation process much quicker and does not require any compilers to be installed on the system. However, the code will most likely perform significantly worse (by a factor of two to three for some functions) than a custom built version.

Additionally, pre-compiled binaries are distributed for the following systems:

Packaging status

Building only the C++ part

If you want to use ducc's algorithms in a C++ code, there is a CMakeLists.txt file to help you integrate the library into your project. Please use the C++ interface only as an internal dependency of your projects and do not install the ducc0 C++ library system-wide, since its interface is not guaranteed to be stable and in fact expected to change significantly in the future.

DUCC components

ducc.fft

This package provides Fast Fourier, trigonometric and Hartley transforms with a simple Python interface. It is an evolution of pocketfft and pypocketfft which are currently used by numpy and scipy.

The central algorithms are derived from Paul Swarztrauber's FFTPACK code.

Features

  • supports fully complex and half-complex (i.e. complex-to-real and real-to-complex) FFTs, discrete sine/cosine transforms and Hartley transforms
  • achieves very high accuracy for all transforms
  • supports multidimensional arrays and selection of the axes to be transformed
  • supports single, double, and long double precision
  • makes use of CPU vector instructions, except for short 1D transforms
  • supports prime-length transforms without degrading to O(N**2) performance
  • has optional multi-threading support for all transforms except short 1D ones.

Design decisions and performance characteristics

  • there is no explicit plan management to be done by the user, making the interface as simple as possible. A small number of plans is cached internally, which does not consume much memory, since the storage requirement for a plan only scales with the square root of the FFT length for large lengths.
  • 1D transforms are somewhat slower than those provided by FFTW (if FFTW's plan generation overhead is ignored)
  • multi-D transforms in double precision perform fairly similar to FFTW with FFTW_MEASURE; in single precision ducc.fft can be significantly faster.

ducc.nufft

Library for non-uniform FFTs in 1D/2D/3D (currently only supports transform types 1 and 2). The goal is to provide similar or better performance and accuracy than FINUFFT, making use of lessons learned during the implementation of the wgridder module (see below).

ducc.sht

This package provides efficient spherical harmonic transforms (SHTs). Its code is derived from libsharp, but has been significantly enhanced.

Noteworthy features

  • very efficient support for spherical harmonic synthesis ("alm2map") operations and their adjoint for any grid based on iso-latitude rings with equidistant pixels in each of the rings.
  • support for the same operations on entirely arbitrary spherical grids, i.e. without constraints on pixel locations. This is implemented via intermediate iso-latitude grids and non-uniform FFTs.
  • support for accurate spherical harmonic analyis on certain sub-classes of grids (Clenshaw-Curtis, Fejer-1 and McEwen-Wiaux) at band limits beyond those for which quadrature weights exist. For details see this note.
  • iterative approximate spherical harmonic analysis on aritrary grids.
  • substantially improved transformation speed (up to a factor of 2) on the above mentioned grid geometries for high band limits.
  • accelerated recurrences as presented in Ishioka (2018)
  • vector instruction support
  • multi-threading support

The code for rotating spherical harmonic coefficients was taken (with some modifications) from Mikael Slevinsky's FastTransforms package.

ducc.healpix

This library provides Python bindings for the most important functionality related to the HEALPix tesselation, except for spherical harmonic transforms, which are covered by ducc.sht.

The design goals are

  • similarity to the interface of the HEALPix C++ library (while respecting some Python peculiarities)
  • simplicity (no optional function parameters)
  • low function calling overhead

ducc.totalconvolve

Library for high-accuracy 4pi convolution on the sphere, which generates a total convolution data cube from a set of sky and beam a_lm and computes interpolated values for a given list of detector pointings. This code has evolved from the original totalconvolver algorithm via the conviqt code.

Algorithmic details:

  • the code uses ducc.sht SHTs and ducc.fft FFTs to compute the data cube
  • shared-memory parallelization is provided via standard C++ threads.
  • for interpolation, the algorithm and kernel described in https://arxiv.org/abs/1808.06736 are used. This allows very efficient interpolation with user-adjustable accuracy.

ducc.wgridder

Library for high-accuracy gridding/degridding of radio interferometry datasets (code paper available at https://arxiv.org/abs/2010.10122). This code has also been integrated into wsclean (https://arxiv.org/abs/1407.1943) as the wgridder component.

Programming aspects

  • shared-memory parallelization via standard C++ threads.
  • kernel computation is performed on the fly, avoiding inaccuracies due to table lookup and reducing overall memory bandwidth

Numerical aspects

  • uses a generalization of the analytical gridding kernel presented in https://arxiv.org/abs/1808.06736
  • uses the "improved W-stacking method" described in https://arxiv.org/abs/2101.11172
  • in combination these two aspects allow extremely accurate gridding/degridding operations (L2 error compared to explicit DFTs can go below 1e-12) with reasonable resource consumption

ducc.misc

Various unsorted functionality which will hopefully be categorized in the future.

This module contains an efficient algorithm for the computation of abscissas and weights for Gauss-Legendre quadrature. For degrees up to 100, the solutions are computed in the standard iterative fashion; for higher degrees Ignace Bogaert's FastGL algorithm is used.