pydata/numexpr

expm1 low accuracy for complex argument

thangleiter opened this issue · 8 comments

expm1(1j*x) has the same (low) accuracy for small complex arguments as exp(1j*x) - 1. For real arguments, the accuracy is as expected and on par with numpy:
image

Code to reproduce:
import matplotlib.pyplot as plt
import numpy as np
import numexpr as ne

x = np.geomspace(1e-18, 1e-6, 101)

numpy = {}
numpy_naive = {}
numexpr = {}

numexpr['float'] = ne.evaluate('expm1(x)')
numpy['float'] = np.expm1(x)
numpy_naive['float'] = np.exp(x) - 1

numexpr['complex'] = ne.evaluate('expm1(1j*x)')
numpy['complex'] = np.expm1(1j*x)
numpy_naive['complex'] = np.exp(1j*x) - 1

fig, ax = plt.subplots(2, 2, sharex=True, constrained_layout=True)
ax[0, 0].loglog(x, numpy['float'].real, label='numpy')
ax[0, 0].loglog(x, numpy_naive['float'].real, '-.', label='numpy naive')
ax[0, 0].loglog(x, numexpr['float'].real, '--', label='numexpr')
ax[1, 0].loglog(x, abs((numpy['float'] - numexpr['float']).real), label='|np - ne| (expm1)')
ax[0, 0].legend()
ax[1, 0].legend()
ax[0, 0].grid(True)
ax[1, 0].grid(True)

ax[0, 1].loglog(x, -numpy['complex'].real, label='numpy')
ax[0, 1].loglog(x, -numpy_naive['complex'].real, '-.', label='numpy naive')
ax[0, 1].loglog(x, -numexpr['complex'].real, '--', label='numexpr')
ax[1, 1].loglog(x, abs((numpy['complex'] - numexpr['complex']).real), label='|np - ne| (expm1)')
ax[0, 1].legend()
ax[1, 1].legend()
ax[0, 1].grid(True)
ax[1, 1].grid(True)

ax[0, 0].set_title('expm1(x)')
ax[0, 1].set_title('-real(expm1(1j*x))')
ax[1, 0].set_yscale('linear')
ax[1, 0].set_xlabel('x')
ax[1, 1].set_xlabel('x')

NumExpr appears to have been intentionally implemented with fast but low-precision complex functions. If you have an interest, you can browse them here:

https://github.com/pydata/numexpr/blob/master/numexpr/complex_functions.hpp

If you want high-precision complex algebra in NumExpr the easiest fix would be to compile with Intel MKL support (or use the conda distribution), as with MKL you have the option to set the precision for complex operations. e.g.

import numexpr as ne
ne.set_vml_accuracy_mode('high')

Vanilla NumExpr is roughly equivalent in precision to the MKL 'fast' mode.

I see, thanks. I think this is quite misleading since the whole point of a dedicated expm1 is improved accuracy for small arguments due to special handling of the problematic computations (cos(0) - 1 in the imaginary case). To my understanding, increasing the floating point precision will only move the crossover point where the accuracy breaks down. As it is now, expm1 has the same speed and accuracy as exp - 1 for complex input, which makes it kind of pointless.

Taking a look at how expm1 is implemented for complex types in NumPy makes me think this should be easy to do here as well:

static void
nc_expm1@c@(@ctype@ *x, @ctype@ *r)
{
    @ftype@ a = npy_sin@c@(x->imag / 2);
    r->real = npy_expm1@c@(x->real) * npy_cos@c@(x->imag) - 2 * a * a;
    r->imag = npy_exp@c@(x->real) * npy_sin@c@(x->imag);
    return;
}

I'd be happy to open a PR if you agree.

Sure a PR would be fine by me.

Just getting around to look at this now. Found one unfortunate additional issue in that, for MKL, it isn't using a 'nice' implementation in interpreter.cpp:

#ifdef USE_VML
/* complex expm1 not available in VML */
static void vzExpm1(MKL_INT n, const MKL_Complex16* x1, MKL_Complex16* dest)
{
    MKL_INT j;
    vzExp(n, x1, dest);
    for (j=0; j<n; j++) {
        dest[j].real -= 1.0;
    };
};

vzExp is the MKL exponential function, but there's no corresponding vzExpm1 function. I'm not sure why, I'll have to write some test code here.

@thangleiter so I have not been able to replicate your problem. I made a new branch and pushed changes to it so I can test both functions simultaneously. Here's my test script:

https://github.com/pydata/numexpr/blob/issue418/issues/issue418.py

As you can see I simultaneously have both versions of the function implemented. Would you look at it any see if I am doing anything wrong in my evaluation?

When you found problems with NumExpr in the initial issue report, were you using a NumExpr compiled with Intel MKL support? Either from conda or Gohkle's repo?

@thangleiter also which compiler are you using? We can check with Godbolt if there's some compiler optimization doing something strange.

@thangleiter so I have not been able to replicate your problem. I made a new branch and pushed changes to it so I can test both functions simultaneously. Here's my test script:

https://github.com/pydata/numexpr/blob/issue418/issues/issue418.py

As you can see I simultaneously have both versions of the function implemented. Would you look at it any see if I am doing anything wrong in my evaluation?

The implementation of expm1x looks good to me. However, I am noticing some really strange behavior when testing with both expm1 and expm1x. Somehow, evaluate('expm1x(z)') gives different results depending on the implementation of nc_expm1. I.e., if the latter is implemented as

static void
nc_expm1(npy_cdouble *x, npy_cdouble *r)
{
    double a = exp(x->real);
    r->real = a*cos(x->imag) - 1.0;
    r->imag = a*sin(x->imag);
    return;
}

the real part of expm1x is inaccurate, whereas if it is implemented as

static void
nc_expm1(npy_cdouble *x, npy_cdouble *r)
{
    double a = sin(x->imag / 2);
    double b = exp(x->real);
    r->real = expm1(x->real) * cos(x->imag) - 2 * a * a;
    r->imag = b * sin(x->imag);
    return;
}

everything is accurate compared to np.expm1. So it looks like evaluate('expm1x(z)') actually uses nc_expm1.

When you found problems with NumExpr in the initial issue report, were you using a NumExpr compiled with Intel MKL support? Either from conda or Gohkle's repo?

I am using NumExpr from conda but don't use MKL:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.8.4dev1
NumPy version:     1.23.1
Python version:    3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)]
Platform:          win32-AMD64-10.0.22000
CPU vendor:        AuthenticAMD
CPU model:         AMD Ryzen 7 PRO 4750U with Radeon Graphics
CPU clock speed:   1697 MHz
VML available?     False
Number of threads used by default: 8 (out of 16 detected cores)
Maximum number of threads: 64
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

@thangleiter also which compiler are you using? We can check with Godbolt if there's some compiler optimization doing something strange.

I tested this on both MSVC143 and g++.

Ok, sorry for the long delay but frustrated with the obtuse implementation of NumExpr. I eventually gave up and just overwrote the complex exp function to do a comparison. Your function is 25 % slower, but it is indeed significantly more accurate for the real component over a wide range of values. Therefore I have merged your changes into master. I'll prepare a new release immediately since Python 3.11 is out.