fftn on multiple arrays faster using python loop
mvandeput opened this issue · 2 comments
Running mkl_fft.fftn
on an array with shape (70, 196, 150, 24) only on the last 3 axes is more than 2 times slower than running the transform on each of the 70 sub-arrays individually using a python loop.
This is unexpected as one would assume the loop should actually run faster inside of mkl.
Simple example:
import mkl_fft
import time
shape = (70, 196, 150, 24)
array = np.random.random(shape) + 1j * np.random.random(shape)
def transform(array):
result = np.empty(array.shape, dtype=complex)
for ii, arr in enumerate(array):
result[ii] = mkl_fft.fftn(arr)
return result
t0 = time.time()
a = mkl_fft.fftn(array, axes=(1, 2, 3))
t1 = time.time()
b = transform(array)
t2 = time.time()
print('fftn on full array: {:.0f} ms'.format(1000*(t1 - t0)))
print('loop of fftn on subarray: {:.0f} ms'.format(1000*(t2 - t1)))
print(np.allclose(a, b))
On my machine this returns:
fftn on full array: 1359 ms
loop of fftn on subarray: 619 ms
True
I also verified the timings with timeit
instead of time.time
.
@mvandeput Thank you for the report. It is shortcoming of mkl_fft
implementation of fftn
. If the fftn
call is not a transform of the full array, it simple mindedly uses iteration of 1D transforms.
One can write a function that iteratively calls fftn
on subslices indicated by axes
.
import numpy as np
def flat_to_multi(ind, shape):
"""Maps flat iter index into multi_index"""
nd = len(shape)
m_ind = [-1] * nd
j = ind
for i in range(nd):
si = shape[nd-1-i]
q = j // si
r = j - si * q
m_ind[nd-1-i] = r
j = q
return m_ind
def iter_complementary(x, axes, func, kwargs, result):
if axes is None:
return func(x, **kwargs)
x_shape = x.shape
nd = x.ndim
r = list(range(nd))
sl = [slice(None, None, None)] * nd
if not isinstance(axes, tuple):
axes = (axes,)
for ai in axes:
r[ai] = None
size = 1
sub_shape = []
dual_ind = []
for ri in r:
if ri is not None:
size *= x_shape[ri]
sub_shape.append(x_shape[ri])
dual_ind.append(ri)
for ind in range(size):
m_ind = flat_to_multi(ind, sub_shape)
for k1, k2 in zip(dual_ind, m_ind):
sl[k1] = k2
# N.B.: A copy is being made here, which can be avoided
# mkl_fft could write directly into memory of result
result[tuple(sl)] = func(x[tuple(sl)], **kwargs)
return result
Then one can can speed-up the transform:
shape = (70, 196, 150, 24)
array = np.random.random(shape) + 1j * np.random.random(shape)
t0 = time.time()
res_direct = mkl_fft.fftn(array, axes=(1, 2, 3))
t1 = time.time()
t2 = time.time()
tmp = np.empty(array.shape, dtype='D')
res_new = iter_complementary(array, (1, 2, 3), mkl_fft.fftn, dict(), tmp)
t3 = time.time()
print((t1-t0, t3-t2, np.allclose(res_new, res_direct)))
# outputs (1.9125349521636963, 0.6240551471710205, True)
It could be done more efficiently in C, since one can allocate the result once, and have MKL write into the preallocated result array directly, thus avoiding superfluous copies.
It would be really nice if this logic could be added to irfftn_numpy as well...