OpenMathLib/OpenBLAS

Illegal instruction (core dumped) - numpy np.dot() in Python, ggplot() in R

prokulski opened this issue · 17 comments

This is whole story from numpy/numpy#11517

Numpy 1.17.3 in Python crashes, also ggplot() in R does.

Example in Python:

$ python
>>> import numpy as np
>>> A = np.matrix([[1.0], [3.0]])
>>> B = np.matrix([[2.0, 3.0]])
>>> ret = np.dot(A, B)
Illegal instruction (core dumped)

This is probably the reason for the error I caught - Python dies with the "Illegal instruction (core dumped)" when trying to run:

import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10,10))

Proposed sollution:

import os
os.environ["OPENBLAS_VERBOSE"] = "2"
os.environ["OPENBLAS_CORETYPE"] = "nehalem"
import numpy as np

doesn't work :(

Here is full story and config:

$ python
Python 3.6.8 (default, Oct  9 2019, 14:04:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import numpy as np
>>> print(sys.version)
3.6.8 (default, Oct  9 2019, 14:04:01)
[GCC 8.3.0]
>>> print(np.__version__)
1.17.3
>>> np.show_config()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
CPU info (this is virtual machine):

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
cpu MHz         : 2199.998
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm pti fsgsbase smep erms
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
cpu MHz         : 2199.998
cache size      : 4096 KB
physical id     : 1
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm pti fsgsbase smep erms
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
cpu MHz         : 2199.998
cache size      : 4096 KB
physical id     : 2
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm pti fsgsbase smep erms
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
cpu MHz         : 2199.998
cache size      : 4096 KB
physical id     : 3
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm pti fsgsbase smep erms
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

$ apt list *blas* | grep installed
libblas-dev/bionic,now 3.7.1-4ubuntu1 amd64 [installed,automatic]
libblas3/bionic,now 3.7.1-4ubuntu1 amd64 [installed,automatic]
libgslcblas0/bionic,now 2.4+dfsg-6 amd64 [installed,automatic]
libopenblas-base/bionic,now 0.2.20+ds-4 amd64 [installed,automatic]
libopenblas-dev/bionic,now 0.2.20+ds-4 amd64 [installed]

Following numpy/numpy#11517 answers there is more:

In Python:

>>> import numpy as np
>>> A = np.array([[1.0], [3.0]])
>>> B = np.array([[2.0, 3.0]])
>>> ret = np.matmul(A, B)

works fine and gives:

>>> ret
array([[2., 3.],
       [6., 9.]])

To determine OpenBLAS version, I've used this code:

import numpy
import ctypes

dll = ctypes.CDLL(numpy.core._multiarray_umath.__file__)
get_config = dll.openblas_get_config
get_config.restype=ctypes.c_char_p
res = get_config()
print('OpenBLAS get_config returned', str(res))

It gives:

OpenBLAS get_config returned b'OpenBLAS 0.3.7 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64'

When I've tried:

export OPENBLAS_VERBOSE=2
export OPENBLAS_CORETYPE=prescott
python -c 'import numpy as np; A = np.array([[1.0], [3.0]]); B = np.array([[2.0, 3.0]]); print(np.dot(A, B))'

Everything works fine.

BUT export OPENBLAS_CORETYPE=prescott and then running R and trying to plot (via ggplot()) something also crashes. Simmilar in Jupyter.

I've also pulled and started docker image jupyter/tensorflow-notebook on this machine and np.dot() example runned in Jupyter fails same way.

I think the problem is in KVM virtual machine and bad CPU detection in blas. Strange, that some days earlier everythink was ok... and nothing has changed in KVM (supports says that).

The cpuinfo output is a bit strange - a Skylake without AVX2 - but the code is supposed to handle that already and revert to Sandybridge kernels. Not sure if that actually happens here though, as your config output still states Haswell. Falling back all the way to ancient Prescott seems a bit extreme, can you try OPENBLAS_CORETYPE=Sandybridge ?

OPENBLAS_CORETYPE=Sandybridge works fine

Our cpu capability check goes by processor model id and then "only" checks the OS support bit to see if AVX2 is actually supported in the current operating system configuration. Perhaps this is where things go awry if KVM is not providing sane information. (Not sure why Nehalem did not work for you earlier, but as that one would not even use AVX it may have hit a similar hole in the SSE support of your KVM setup ?)

BTW are you using plain KVM or some added features like Intel KVM-SGX (security guard extensions) ?

Unfortunately, I don't know - I have no way to check it (or maybe I have? How?). VPS server provider doesn't write anything special about this on their pages. I may ask, but about what exactly?

Actually I am not sure myself, I only found a closed issue about broken AVX feature reporting on the KVM-SGX issue tracker here on github. Perhaps asking about the lack of AVX2 support (and from the Nehalem experiment, probably lack of some SSE functions as well) in their VM will do ?

Skylake in KVM means broadwell or avx2 skylake without spectre microcode.
Proper one is (Skylake IBRS).
Absence of AVX2 can be explained by virtualizer kernel not being patched for 10 years. Would be worth getting cpuid from outside VM.

I found the culprit, virt-manager runing on CentOS6 creates machines (well - XML files) with host-model but avx2 and better bits masked out. I think those XML-s will be around for a while, @prokulski thanks for the report, it has to be addressed here, as pointed to at numpy.

More test - vmware is not affected, VMX09 reduces CPUID model/step numbers to ones not having AVX2, so it comes out ivy bridge/SANDYBRIDGE.

@brada4 I do not think the choice of "Skylake" over "Skylake IBRS" would matter here (assuming that the VPS provider does proper housekeeping on the actual hardware). Do you have a link for the CentOS issue ? (What I find strange is that our check for AVX2 avaiability via cpuid(aex=7,...) appears to return true although it is missing from the feature flags list in /proc/cpuinfo)

It is unsupported ("extras") component, and rest of RHEL6 goes EOL in few weeks....
What it does - it masks out all unknown ID bits of RHEL6U0 but keeps native CPU trade name and steppings by default.

There are release notes getting AVX2 and AVX512 support in kernel and qemu and kvm, excluding virt-manager UI package.

It is not introduced by CentOS, it is something never changed by RedHat since no paying customer asked. Using newer client makes proper XML.

Seems a bit unlikely that a service provider would build his business around an outdated and almost EOL operating system release - or are you saying it is still the same in current RHEL ?

Nope, all future versions are OK. The package is provided as unsupported with RHEL6, whose support ends November 30 2019.
... You start business 10 years ago, you make best template possible, than just deploy deploy deploy .... Until somebody notices...

Seems a bit unlikely that a service provider would build his business around an outdated and almost EOL operating system release - or are you saying it is still the same in current RHEL ?

Support said (of course) "we are constantly updating our servers so this point can be rejected"

The only fix in context of bionic LTS context is setting OPENBLAS_CORETYPE=Sandybridge until AVX2 gets properly exposed to a virtual machine.
There is a problem with OpenBLAS CPU detection that it emits AVX2 instructions when particular CPU model allows it without checking CPU ID bits that has none.
Even when that is fixed it is unlikely to return to LTS version you have. It is fairly possible IBRS CPUID bit got lost in same place where AVX2, you cannot control that from inside VM.
If you could power machine off and on - Amazon etc did that for Meltdown at least, QEMU does not update CPUID over plain reboots inside VM.

There is a problem with OpenBLAS CPU detection that it emits AVX2 instructions when particular CPU model allows it without checking CPU ID bits that has none.

The point is that we do the cpuid check and it appears to return true in that KVM setup, despite the "fake" /proc/cpuinfo output not containing the avx2 capability flag. The only chance to catch something like this would appear to be parsing /proc/cpuinfo to cross-check the result of the cpuid() call, and I am not yet convinced it is worth the trouble just to support the occasional mis-configured VM.

I did not spot that in the text firstly.... OK, so the advice for OP is to rebuild vendor's package with NO_AVX2=1 supplanted to avoid rough edges of their system?

Advice remains to use OPENBLAS_CORETYPE=Sandybridge (so no rebuild needed) on that particular VS until the service provider (hopefully) identifies the source of the inconsistency. However we do not have any knowledge if the VS contract promised Intel Skylake or AVX2 capability, or just some unspecified type of compute node.