/matmul

Matrix-matrix multiplication implementations benchmarking

Primary LanguageRust

Matrix-matrix multiplication implementations benchmarking

png

shapes language matmul_impl min
2_4_4_2 rust naive 4.1e-08
2_4_4_2 rust loop_interchange 5.5e-08
2_4_4_2 rust loop_interchange_uncheck 9.3e-08
2_4_4_2 rust loop_interchange_iterators 9.5e-08
2_4_4_2 python naive_numba 1.134e-06
2_4_4_2 python loop_interchange_numba 1.138e-06
2_4_4_2 python numpy_dot_OPENBLAS_NUM_THREADS=4 1.648e-06
2_4_4_2 python numpy_dot_OPENBLAS_NUM_THREADS=1 1.681e-06
2_4_4_2 python naive 9.084e-06
4_4_4_4 rust naive 9.2e-08
4_4_4_4 rust loop_interchange 9.4e-08
4_4_4_4 rust loop_interchange_iterators 1.28e-07
4_4_4_4 rust loop_interchange_uncheck 1.43e-07
4_4_4_4 python loop_interchange_numba 1.196e-06
4_4_4_4 python naive_numba 1.21e-06
4_4_4_4 python numpy_dot_OPENBLAS_NUM_THREADS=1 1.67e-06
4_4_4_4 python numpy_dot_OPENBLAS_NUM_THREADS=4 1.71e-06
4_4_4_4 python naive 3.2999e-05
4_8_8_4 rust naive 1.36e-07
4_8_8_4 rust loop_interchange 1.81e-07
4_8_8_4 rust loop_interchange_iterators 1.84e-07
4_8_8_4 rust loop_interchange_uncheck 3.16e-07
4_8_8_4 python loop_interchange_numba 1.3e-06
4_8_8_4 python naive_numba 1.313e-06
4_8_8_4 python numpy_dot_OPENBLAS_NUM_THREADS=1 1.68e-06
4_8_8_4 python numpy_dot_OPENBLAS_NUM_THREADS=4 1.699e-06
4_8_8_4 python naive 6.111e-05
8_8_8_8 rust naive 4.36e-07
8_8_8_8 rust loop_interchange_iterators 4.71e-07
8_8_8_8 rust loop_interchange 5.66e-07
8_8_8_8 rust loop_interchange_uncheck 6.38e-07
8_8_8_8 python numpy_dot_OPENBLAS_NUM_THREADS=1 1.731e-06
8_8_8_8 python numpy_dot_OPENBLAS_NUM_THREADS=4 1.738e-06
8_8_8_8 python naive_numba 1.851e-06
8_8_8_8 python loop_interchange_numba 2.717e-06
8_8_8_8 python naive 0.000251612
8_16_16_8 rust naive 8.04e-07
8_16_16_8 rust loop_interchange_iterators 9.02e-07
8_16_16_8 rust loop_interchange 1.202e-06
8_16_16_8 rust loop_interchange_uncheck 1.734e-06
8_16_16_8 python numpy_dot_OPENBLAS_NUM_THREADS=1 1.868e-06
8_16_16_8 python numpy_dot_OPENBLAS_NUM_THREADS=4 1.891e-06
8_16_16_8 python naive_numba 2.315e-06
8_16_16_8 python loop_interchange_numba 3.733e-06
8_16_16_8 python naive 0.000484225
16_16_16_16 rust loop_interchange_iterators 1.663e-06
16_16_16_16 rust loop_interchange_uncheck 2.172e-06
16_16_16_16 python numpy_dot_OPENBLAS_NUM_THREADS=1 2.289e-06
16_16_16_16 python numpy_dot_OPENBLAS_NUM_THREADS=4 2.311e-06
16_16_16_16 rust naive 2.954e-06
16_16_16_16 rust loop_interchange 4.16e-06
16_16_16_16 python naive_numba 5.955e-06
16_16_16_16 python loop_interchange_numba 6.311e-06
16_16_16_16 python naive 0.00202464
16_32_32_16 python numpy_dot_OPENBLAS_NUM_THREADS=4 2.724e-06
16_32_32_16 rust loop_interchange_uncheck 2.857e-06
16_32_32_16 rust loop_interchange_iterators 2.947e-06
16_32_32_16 python numpy_dot_OPENBLAS_NUM_THREADS=1 3.694e-06
16_32_32_16 rust naive 6.585e-06
16_32_32_16 rust loop_interchange 7.253e-06
16_32_32_16 python naive_numba 1.0084e-05
16_32_32_16 python loop_interchange_numba 1.2496e-05
16_32_32_16 python naive 0.00397362
32_32_32_32 python numpy_dot_OPENBLAS_NUM_THREADS=4 5.268e-06
32_32_32_32 python numpy_dot_OPENBLAS_NUM_THREADS=1 6.39e-06
32_32_32_32 rust loop_interchange_iterators 8.79e-06
32_32_32_32 rust loop_interchange_uncheck 1.0081e-05
32_32_32_32 rust loop_interchange 3.1189e-05
32_32_32_32 rust naive 3.1453e-05
32_32_32_32 python naive_numba 3.6077e-05
32_32_32_32 python loop_interchange_numba 4.2312e-05
32_32_32_32 python naive 0.0151586
32_64_64_32 python numpy_dot_OPENBLAS_NUM_THREADS=1 7.993e-06
32_64_64_32 python numpy_dot_OPENBLAS_NUM_THREADS=4 8.106e-06
32_64_64_32 rust loop_interchange_iterators 1.5195e-05
32_64_64_32 rust loop_interchange_uncheck 1.7065e-05
32_64_64_32 rust loop_interchange 5.527e-05
32_64_64_32 rust naive 5.6308e-05
32_64_64_32 python naive_numba 7.433e-05
32_64_64_32 python loop_interchange_numba 8.1226e-05
32_64_64_32 python naive 0.0332958
64_64_64_64 python numpy_dot_OPENBLAS_NUM_THREADS=1 2.1443e-05
64_64_64_64 python numpy_dot_OPENBLAS_NUM_THREADS=4 2.4962e-05
64_64_64_64 rust loop_interchange_uncheck 5.7117e-05
64_64_64_64 rust loop_interchange_iterators 6.2439e-05
64_64_64_64 rust naive 0.000233358
64_64_64_64 rust loop_interchange 0.000243321
64_64_64_64 python naive_numba 0.000325634
64_64_64_64 python loop_interchange_numba 0.000330492
64_64_64_64 python naive 0.142963
64_128_128_64 python numpy_dot_OPENBLAS_NUM_THREADS=4 2.4557e-05
64_128_128_64 python numpy_dot_OPENBLAS_NUM_THREADS=1 3.2402e-05
64_128_128_64 rust loop_interchange_uncheck 0.000114561
64_128_128_64 rust loop_interchange_iterators 0.000125233
64_128_128_64 rust loop_interchange 0.000433922
64_128_128_64 rust naive 0.000527876
64_128_128_64 python naive_numba 0.000656887
64_128_128_64 python loop_interchange_numba 0.000789128
64_128_128_64 python naive 0.269775
128_128_128_128 python numpy_dot_OPENBLAS_NUM_THREADS=4 6.9473e-05
128_128_128_128 python numpy_dot_OPENBLAS_NUM_THREADS=1 0.00011069
128_128_128_128 rust loop_interchange_uncheck 0.000403372
128_128_128_128 rust loop_interchange_iterators 0.000474695
128_128_128_128 rust loop_interchange 0.00198741
128_128_128_128 rust naive 0.00211838
128_128_128_128 python loop_interchange_numba 0.00253365
128_128_128_128 python naive_numba 0.00298101
128_128_128_128 python naive 1.02076
128_256_256_128 python numpy_dot_OPENBLAS_NUM_THREADS=4 0.000115107
128_256_256_128 python numpy_dot_OPENBLAS_NUM_THREADS=1 0.000200865
128_256_256_128 rust loop_interchange_uncheck 0.000957269
128_256_256_128 rust loop_interchange_iterators 0.00100693
128_256_256_128 rust loop_interchange 0.00397211
128_256_256_128 rust naive 0.0049346
128_256_256_128 python loop_interchange_numba 0.00500387
128_256_256_128 python naive_numba 0.00654814
128_256_256_128 python naive 1.93112
256_256_256_256 python numpy_dot_OPENBLAS_NUM_THREADS=4 0.000370298
256_256_256_256 python numpy_dot_OPENBLAS_NUM_THREADS=1 0.00074354
256_256_256_256 rust loop_interchange_uncheck 0.00362797
256_256_256_256 rust loop_interchange_iterators 0.00419063
256_256_256_256 rust loop_interchange 0.0156426
256_256_256_256 python loop_interchange_numba 0.0203815
256_256_256_256 rust naive 0.0249146
256_256_256_256 python naive_numba 0.0300829
256_256_256_256 python naive 7.73768
256_512_512_256 python numpy_dot_OPENBLAS_NUM_THREADS=4 0.000755285
256_512_512_256 python numpy_dot_OPENBLAS_NUM_THREADS=1 0.00172441
256_512_512_256 rust loop_interchange_uncheck 0.00688004
256_512_512_256 rust loop_interchange_iterators 0.0085206
256_512_512_256 rust loop_interchange 0.0316979
256_512_512_256 python loop_interchange_numba 0.0406425
256_512_512_256 rust naive 0.0565698
256_512_512_256 python naive_numba 0.0706965
512_512_512_512 python numpy_dot_OPENBLAS_NUM_THREADS=4 0.00223443
512_512_512_512 python numpy_dot_OPENBLAS_NUM_THREADS=1 0.00688939
512_512_512_512 rust loop_interchange_uncheck 0.0275972
512_512_512_512 rust loop_interchange_iterators 0.0337366
512_512_512_512 rust loop_interchange 0.124312
512_512_512_512 python loop_interchange_numba 0.172666
512_512_512_512 rust naive 0.205607
512_512_512_512 python naive_numba 0.257431
512_1024_1024_512 python numpy_dot_OPENBLAS_NUM_THREADS=4 0.00423702
512_1024_1024_512 python numpy_dot_OPENBLAS_NUM_THREADS=1 0.0142917
512_1024_1024_512 rust loop_interchange_uncheck 0.0764949
512_1024_1024_512 rust loop_interchange_iterators 0.0842156
512_1024_1024_512 rust loop_interchange 0.256322
512_1024_1024_512 python loop_interchange_numba 0.318879
512_1024_1024_512 python naive_numba 0.700258
512_1024_1024_512 rust naive 1.11825
1024_1024_1024_1024 python numpy_dot_OPENBLAS_NUM_THREADS=4 0.0173084
1024_1024_1024_1024 python numpy_dot_OPENBLAS_NUM_THREADS=1 0.057143
1024_1024_1024_1024 rust loop_interchange_uncheck 0.458546
1024_1024_1024_1024 rust loop_interchange_iterators 0.463142
1024_1024_1024_1024 rust loop_interchange 1.08788
1024_1024_1024_1024 python loop_interchange_numba 1.2439
1024_1024_1024_1024 rust naive 5.13137
1024_1024_1024_1024 python naive_numba 8.68298

Machine info

lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               142
Model name:          Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
Stepping:            12
CPU MHz:             1900.034
CPU max MHz:         3900.0000
CPU min MHz:         400.0000
BogoMIPS:            3600.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities

poetry run python -c "import numpy; numpy.show_config()"

blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

rustc --version --verbose

rustc 1.54.0 (a178d0322 2021-07-26)
binary: rustc
commit-hash: a178d0322ce20e33eac124758e837cbd80a6f633
commit-date: 2021-07-26
host: x86_64-unknown-linux-gnu
release: 1.54.0
LLVM version: 12.0.1

poetry run python -m numba -s

System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2021-10-03 15:33:47.723610
UTC start time                                : 2021-10-03 12:33:47.723617
Running time (s)                              : 1.628139

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : skylake
CPU Count                                     : 8
Number of accessible CPUs                     : 8
List of accessible CPUs cores                 : 0-7
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2 bmi bmi2
                                                clflushopt cmov cx16 cx8 f16c fma
                                                fsgsbase fxsr invpcid lzcnt mmx
                                                movbe pclmul popcnt prfchw rdrnd
                                                rdseed sahf sgx sse sse2 sse3
                                                sse4.1 sse4.2 ssse3 xsave xsavec
                                                xsaveopt xsaves

Memory Total (MB)                             : 23804
Memory Available (MB)                         : 8832

__OS Information__
Platform Name                                 : Linux-4.15.0-153-generic-x86_64-with-glibc2.27
Platform Release                              : 4.15.0-153-generic
OS Name                                       : Linux
OS Version                                    : #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021
OS Specific Version                           : ?
Libc Version                                  : glibc 2.27

__Python Information__
Python Compiler                               : GCC 7.5.0
Python Implementation                         : CPython
Python Version                                : 3.8.6
Python Locale                                 : en_GB.UTF-8

__Numba Toolchain Versions__
Numba Version                                 : 0.54.0
llvmlite Version                              : 0.37.0

__LLVM Information__
LLVM Version                                  : 11.1.0

__CUDA Information__
CUDA Device Initialized                       : False
CUDA Driver Version                           : ?
CUDA Runtime Version                          : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : True
+-->Vendor: GNU
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.