OpenBLAS is suspiciously slow (wrt. BLIS/MKL on AMD)
Closed this issue · 5 comments
- I read the conda-forge documentation and could not find the solution for my problem there.
Issue
OpenBLAS is suspiciously slow in numpy (order of magnitude slower than both BLIS and MKL, on an AMD 3950x!).
Steps
- Create an MKL environment:
conda create -n mkl numpy mkl
- Create a BLIS environment:
conda create -n blis numpy blis nomkl
- Create an OpenBLAS environment:
conda create -n openblas numpy openblas nomkl
- Start a jupyter notebook/lab (in each environment, separately):
$ OMP_NUM_THREADS=1 BLIS_NUM_THREADS=1 MKL_NUM_THREADS=1 jupyter lab
- Run the following code to get timings:
import numpy as np
sizes = (1, 2, 3, 4, 32, 64, 127, 128, 129, 1023, 1024, 1025, 4096, 4096*2-1, 4096*2, 4096*2+1)
best_times = np.zeros(len(sizes))
for i, s in enumerate(sizes):
arr = np.random.rand(s, s)
arrT = np.random.rand(s, s)
t = %timeit -o arr @ arrT
best_times[i] = t.best
I checked that CPU usage never exceeded 100.0 in top
in all cases, throughout the full benchmark, until the very end.
Result
Last point is around 25s in both MKL and BLIS; it is 3min30s in OpenBLAS. Last time I did something similar, OpenBLAS was on par with MKL. Again I insist: CPU usage was capped at 100% in all cases, there is no underlying multithreading here.
Conda environment
Environment (
conda list
):
$ conda list
[...]
openblas 0.3.17 pthreads_h4748800_0 conda-forge
[...]
Full list here:
$ conda list
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
alsa-lib 1.2.3 h516909a_0 conda-forge
anyio 3.3.0 py39hf3d152e_0 conda-forge
argon2-cffi 20.1.0 py39h3811e60_2 conda-forge
async_generator 1.10 py_0 conda-forge
atk-1.0 2.36.0 h3371d22_4 conda-forge
attrs 21.2.0 pyhd8ed1ab_0 conda-forge
babel 2.9.1 pyh44b312d_0 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
bleach 3.3.1 pyhd8ed1ab_0 conda-forge
brotlipy 0.7.0 py39h3811e60_1001 conda-forge
ca-certificates 2021.5.30 ha878542_0 conda-forge
cairo 1.16.0 h6cf1ce9_1008 conda-forge
certifi 2021.5.30 py39hf3d152e_0 conda-forge
cffi 1.14.6 py39he32792d_0 conda-forge
chardet 4.0.0 py39hf3d152e_1 conda-forge
charset-normalizer 2.0.0 pyhd8ed1ab_0 conda-forge
colorama 0.4.4 pyh9f0ad1d_0 conda-forge
cryptography 3.4.7 py39hbca0aa6_0 conda-forge
cycler 0.10.0 py_2 conda-forge
dbus 1.13.6 h48d8840_2 conda-forge
debugpy 1.4.1 py39he80948d_0 conda-forge
decorator 5.0.9 pyhd8ed1ab_0 conda-forge
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
entrypoints 0.3 pyhd8ed1ab_1003 conda-forge
expat 2.4.1 h9c3ff4c_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.13.1 hba837de_1005 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
freetype 2.10.4 h0708190_1 conda-forge
fribidi 1.0.10 h36c2ea0_0 conda-forge
gdk-pixbuf 2.42.6 h04a7f16_0 conda-forge
gettext 0.19.8.1 h0b5b191_1005 conda-forge
giflib 5.2.1 h36c2ea0_2 conda-forge
glib 2.68.3 h9c3ff4c_0 conda-forge
glib-tools 2.68.3 h9c3ff4c_0 conda-forge
graphite2 1.3.13 h58526e2_1001 conda-forge
graphviz 2.48.0 h85b4f2f_0 conda-forge
gst-plugins-base 1.18.4 hf529b03_2 conda-forge
gstreamer 1.18.4 h76c114f_2 conda-forge
gtk2 2.24.33 h539f30e_1 conda-forge
gts 0.7.6 h64030ff_2 conda-forge
harfbuzz 2.8.2 h83ec7ef_0 conda-forge
icc_rt 2020.2 intel_254 numba
icu 68.1 h58526e2_0 conda-forge
idna 3.1 pyhd3deb0d_0 conda-forge
importlib-metadata 4.6.1 py39hf3d152e_0 conda-forge
ipykernel 6.0.3 py39hef51801_0 conda-forge
ipython 7.25.0 py39hef51801_1 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
jbig 2.1 h7f98852_2003 conda-forge
jedi 0.18.0 py39hf3d152e_2 conda-forge
jinja2 3.0.1 pyhd8ed1ab_0 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
json5 0.9.5 pyh9f0ad1d_0 conda-forge
jsonschema 3.2.0 pyhd8ed1ab_3 conda-forge
jupyter_client 6.1.12 pyhd8ed1ab_0 conda-forge
jupyter_core 4.7.1 py39hf3d152e_0 conda-forge
jupyter_server 1.10.1 pyhd8ed1ab_0 conda-forge
jupyterlab 3.0.16 pyhd8ed1ab_0 conda-forge
jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge
jupyterlab_server 2.6.1 pyhd8ed1ab_0 conda-forge
kiwisolver 1.3.1 py39h1a9c180_1 conda-forge
krb5 1.19.1 hcc1bbae_0 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.36.1 hea4e1c9_1 conda-forge
lerc 2.2.1 h9c3ff4c_0 conda-forge
libblas 3.9.0 5_h92ddd45_netlib conda-forge
libcblas 3.9.0 5_h92ddd45_netlib conda-forge
libclang 11.1.0 default_ha53f305_1 conda-forge
libdeflate 1.7 h7f98852_5 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libevent 2.1.10 hcdb4288_3 conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-ng 11.1.0 hc902ee8_2 conda-forge
libgd 2.3.2 h78a0170_0 conda-forge
libgfortran-ng 11.1.0 h69a702a_0 conda-forge
libgfortran5 11.1.0 h6c583b3_0 conda-forge
libglib 2.68.3 h3e27bee_0 conda-forge
libgomp 11.1.0 hc902ee8_2 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 5_h92ddd45_netlib conda-forge
libllvm11 11.1.0 hf817b99_2 conda-forge
libogg 1.3.4 h7f98852_1 conda-forge
libopenblas 0.3.17 pthreads_h8fe5266_0 conda-forge
libopus 1.3.1 h7f98852_1 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libpq 13.3 hd57d9b9_0 conda-forge
librsvg 2.50.7 hc3c00ef_0 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libstdcxx-ng 11.1.0 h56837e0_2 conda-forge
libtiff 4.3.0 hf544144_1 conda-forge
libtool 2.4.6 h58526e2_1007 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libvorbis 1.3.7 h9c3ff4c_0 conda-forge
libwebp 1.2.0 h3452ae3_0 conda-forge
libwebp-base 1.2.0 h7f98852_2 conda-forge
libxcb 1.13 h7f98852_1003 conda-forge
libxkbcommon 1.0.3 he3ba5ed_0 conda-forge
libxml2 2.9.12 h72842e0_0 conda-forge
llvmlite 0.37.0rc2 py39hf484d3e_0 numba
lz4-c 1.9.3 h9c3ff4c_0 conda-forge
markupsafe 2.0.1 py39h3811e60_0 conda-forge
matplotlib 3.4.2 py39hf3d152e_0 conda-forge
matplotlib-base 3.4.2 py39h2fa2bec_0 conda-forge
matplotlib-inline 0.1.2 pyhd8ed1ab_2 conda-forge
mistune 0.8.4 py39h3811e60_1004 conda-forge
mysql-common 8.0.25 ha770c72_2 conda-forge
mysql-libs 8.0.25 hfa10184_2 conda-forge
nbclassic 0.3.1 pyhd8ed1ab_1 conda-forge
nbclient 0.5.3 pyhd8ed1ab_0 conda-forge
nbconvert 6.1.0 py39hf3d152e_0 conda-forge
nbformat 5.1.3 pyhd8ed1ab_0 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
nest-asyncio 1.5.1 pyhd8ed1ab_0 conda-forge
nomkl 1.0 h5ca1d4c_0 conda-forge
notebook 6.4.0 pyha770c72_0 conda-forge
nspr 4.30 h9c3ff4c_0 conda-forge
nss 3.67 hb5efdd6_0 conda-forge
numba 0.54.0rc1 np1.16py3.9hc547734_g9bed2ebb2_0 numba
numpy 1.21.1 py39hdbf815f_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openblas 0.3.17 pthreads_h4748800_0 conda-forge
openjpeg 2.4.0 hb52868f_1 conda-forge
openssl 1.1.1k h7f98852_0 conda-forge
packaging 21.0 pyhd8ed1ab_0 conda-forge
pandoc 2.14.1 h7f98852_0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
pango 1.48.7 hb8ff022_0 conda-forge
parso 0.8.2 pyhd8ed1ab_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 8.3.1 py39ha612740_0 conda-forge
pip 21.2.1 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 h36c2ea0_0 conda-forge
prometheus_client 0.11.0 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.19 pyha770c72_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pycparser 2.20 pyh9f0ad1d_2 conda-forge
pygments 2.9.0 pyhd8ed1ab_0 conda-forge
pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyqt 5.12.3 py39hf3d152e_7 conda-forge
pyqt-impl 5.12.3 py39h0fcd23e_7 conda-forge
pyqt5-sip 4.19.18 py39he80948d_7 conda-forge
pyqtchart 5.12 py39h0fcd23e_7 conda-forge
pyqtwebengine 5.12.1 py39h0fcd23e_7 conda-forge
pyrsistent 0.17.3 py39h3811e60_2 conda-forge
pysocks 1.7.1 py39hf3d152e_3 conda-forge
python 3.9.6 h49503c6_1_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.9 2_cp39 conda-forge
pytz 2021.1 pyhd8ed1ab_0 conda-forge
pyyaml 5.4.1 py39h3811e60_0 conda-forge
pyzmq 22.1.0 py39h37b5a0c_0 conda-forge
qt 5.12.9 hda022c4_4 conda-forge
readline 8.1 h46c0cb4_0 conda-forge
requests 2.26.0 pyhd8ed1ab_0 conda-forge
requests-unixsocket 0.2.0 py_0 conda-forge
roctools 0.0.0 hf484d3e_1 numba
scipy 1.7.0 py39hee8e79c_1 conda-forge
send2trash 1.7.1 pyhd8ed1ab_0 conda-forge
setuptools 49.6.0 py39hf3d152e_3 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sniffio 1.2.0 py39hf3d152e_1 conda-forge
sqlite 3.36.0 h9cd32fc_0 conda-forge
tbb 2021.1.1 intel_119 numba
terminado 0.10.1 py39hf3d152e_0 conda-forge
testpath 0.5.0 pyhd8ed1ab_0 conda-forge
tk 8.6.10 h21135ba_1 conda-forge
tornado 6.1 py39h3811e60_1 conda-forge
traitlets 5.0.5 py_0 conda-forge
tzdata 2021a he74cb21_1 conda-forge
urllib3 1.26.6 pyhd8ed1ab_0 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
websocket-client 0.57.0 py39hf3d152e_4 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.3 hd9c2040_1000 conda-forge
xorg-libx11 1.7.2 h7f98852_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h7f98852_1 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.5 h516909a_1 conda-forge
yaml 0.2.5 h516909a_0 conda-forge
zeromq 4.3.4 h9c3ff4c_0 conda-forge
zipp 3.5.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
zstd 1.5.0 ha95c52a_0 conda-forge
Details about
conda
and system ( conda info
):
$ conda info
active environment : base
active env location : /home/user/Documents/Programming/Toolchains/miniconda3
shell level : 1
user config file : /home/user/.condarc
populated config files :
conda version : 4.10.3
conda-build version : not installed
python version : 3.8.10.final.0
virtual packages : __linux=5.13.4=0
__glibc=2.33=0
__unix=0=0
__archspec=1=x86_64
base environment : /home/user/Documents/Programming/Toolchains/miniconda3 (writable)
conda av data dir : /home/user/Documents/Programming/Toolchains/miniconda3/etc/conda
conda av metadata url : None
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/user/Documents/Programming/Toolchains/miniconda3/pkgs
/home/user/.conda/pkgs
envs directories : /mnt/scratch/user/Programming/Conda/envs
/home/user/Documents/Programming/Toolchains/miniconda3/envs
/home/user/.conda/envs
platform : linux-64
user-agent : conda/4.10.3 requests/2.25.1 CPython/3.8.10 Linux/5.13.4-arch2-1 arch/ glibc/2.33
UID:GID : 1000:1000
netrc file : None
offline mode : False
Create an MKL environment: conda create -n mkl numpy mkl
Create a BLIS environment: conda create -n blis numpy blis nomkl
Create an OpenBLAS environment: conda create -n openblas numpy openblas nomkl
This is not the correct way. Please see our docs on how to switch blas implementation.
What are you talking about?
The point is not how to switch implementations in the most comfortable way (feel free to use whichever method you prefer to switch).
The point is about this OpenBLAS being much slower than BLIS, which is not how things used to be.
The point is not how to switch implementations in the most comfortable way
I didn't say it was comfortable or not. I said it's not correct which means it's wrong. conda list
output you showed has the following,
libblas 3.9.0 5_h92ddd45_netlib conda-forge
libcblas 3.9.0 5_h92ddd45_netlib conda-forge
which means that you are not using openblas and using netlib's reference lapack which is slow. You have both netlib and openblas installed, but numpy is using the netlib one.
Please use the recommended way to switch blas implementation and you'll be able to get an environment where numpy uses openblas.
Why can't openblas require/pull the correct libblas
?
Well at least I suppose this solves this specific bug request though it sounds like improper liblas versions should be made to conflict with mismatching BLAS implementations.