A tool to generate short, hard-coded DFT's in Go from annotated lists created with the FFTW tool genfft.
Download and install the tool:
go get github.com/bemasher/genfft
Build the FFTW genfft tool found in the FFTW source:
cd fftw3/genfft
ocamlbuild -classic-display -libs unix,nums gen_notw.native gen_notw_c.native
Two flavors of DFT's are supported. Both compute short fixed-length complex-to-complex no-twiddle DFT's. One takes real and imaginary input as separate arrays, one set each for input and output. The other takes complex arrays for input and output. The transform signature is determined from the annotated list generated by each tool.
func DFT(ri, ii, ro, io []float64) // gen_notw.native
func DFT(xi, xo []complex128) // gen_notw_c.native
Generate an annotated transform:
N=3; gen_notw_c.native -n ${N} -with-istride 1 -with-ostride 1 -standalone -dump-asched cmplx_${N}.alst cmplx_${N}.cout
This will generate and write the transform's operations and constants to the file cmplx_3.alst
:
(:= T1 xi[0])
(:= T2 xi[1])
(:= T3 xi[2])
(:= T4 (+ T2 T3))
(:= T6 (* I (* KP866025403 (+ T3 (- T2)))))
(:= xo[0] (+ T1 T4))
(:= T5 (+ T1 (- (* KP500000000 T4))))
(:= xo[2] (+ T5 (- T6)))
(:= xo[1] (+ T5 T6))
Constants required for the transform are found in cmplx_3.cout
, which is parsed only for lines prefixed by DK and DVK.
DVK(KP500000000, +0.500000000000000000000000000000000000000000000);
DVK(KP866025403, +0.866025403784438646763723170752936183471402627);
The tool reads information about DFT's to transform from config.json
. To transform the size 3 complex DFT, config.json
should contain:
[{ "prefix": "cmplx_3", "func": "DftCmplx3" }]
Executing genfft
with this config produces the output file cmplx_3.go
containing a function named DftCmplx3
:
package dft
func DftCmplx3(xi, xo []complex128) {
const (
I = 1i
KP500000000 = +0.500000000000000000000000000000000000000000000
KP866025403 = +0.866025403784438646763723170752936183471402627
)
T1 := xi[0]
T2 := xi[1]
T3 := xi[2]
T4 := T2 + T3
T6 := I * (KP866025403 * (T3 - T2))
xo[0] = T1 + T4
T5 := T1 - KP500000000*T4
xo[2] = T5 - T6
xo[1] = T5 + T6
}
Transforms safely perform in-place and out-of-place transforms depending on the function arguments.
Tests compare output of a naive DFT to output of the generated DFT's. DFT's pass when they are error is within a specified tolerance. See dft/dft_test.go
for more details.
A benchmark is also provided to compare performance of the various transforms.
11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz + 32GB DDR4:
goos: windows
goarch: amd64
pkg: github.com/bemasher/genfft/dft
cpu: 11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz
BenchmarkFloatDFT/Naive_DFT_N=12-12 815853 1474 ns/op 8.14 MB/s 192 B/op 1 allocs/op
BenchmarkFloatDFT/In_Place_Float_DFT_N=12-12 70460990 17.29 ns/op 694.13 MB/s 0 B/op 0 allocs/op
BenchmarkFloatDFT/Out_of_Place_Float_DFT_N=12-12 70753047 17.00 ns/op 705.75 MB/s 0 B/op 0 allocs/op
BenchmarkCmplxDFT/Naive_Cmplx_DFT_N=12-12 800121 1470 ns/op 8.16 MB/s 192 B/op 1 allocs/op
BenchmarkCmplxDFT/In_Place_Cmplx_DFT_N=12-12 59896080 19.69 ns/op 609.57 MB/s 0 B/op 0 allocs/op
BenchmarkCmplxDFT/Out_of_Cmplx_Place_DFT_N=12-12 60725056 19.08 ns/op 628.92 MB/s 0 B/op 0 allocs/op
PASS
ok github.com/bemasher/genfft/dft 7.316s
Raspberry Pi 2 Model B Rev 1.1:
goos: linux
goarch: arm
pkg: github.com/bemasher/genfft/dft
BenchmarkFloatDFT/Naive_DFT_N=12-4 15980 74300 ns/op 0.16 MB/s 192 B/op 1 allocs/op
BenchmarkFloatDFT/In_Place_Float_DFT_N=12-4 1972201 606.0 ns/op 19.80 MB/s 0 B/op 0 allocs/op
BenchmarkFloatDFT/Out_of_Place_Float_DFT_N=12-4 1959711 611.5 ns/op 19.62 MB/s 0 B/op 0 allocs/op
BenchmarkCmplxDFT/Naive_Cmplx_DFT_N=12-4 16048 74256 ns/op 0.16 MB/s 192 B/op 1 allocs/op
BenchmarkCmplxDFT/In_Place_Cmplx_DFT_N=12-4 1490419 803.4 ns/op 14.94 MB/s 0 B/op 0 allocs/op
BenchmarkCmplxDFT/Out_of_Cmplx_Place_DFT_N=12-4 1476600 806.1 ns/op 14.89 MB/s 0 B/op 0 allocs/op
PASS
ok github.com/bemasher/genfft/dft 11.639s