intel/intel-extension-for-tensorflow

Illegal instruction (core dumped)

wswsmao opened this issue · 25 comments

Hello,
I get similar questions as this issue below
#51

this is report

(venv) # python -c "import intel_extension_for_tensorflow as itex; print(itex.__version__)"
2024-02-26 06:54:28.412809: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-26 06:54:28.505267: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-26 06:54:28.505904: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-26 06:54:30.103999: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Illegal instruction (core dumped)

There are my version

# pip list |grep tensorflow
intel-extension-for-tensorflow     2.13.0.0
intel-extension-for-tensorflow-lib 2.13.0.0.0
tensorflow                         2.13.0
tensorflow-estimator               2.13.0
tensorflow-io-gcs-filesystem       0.36.0

AND,I can not install ITEX 1.2.0

# pip install intel-extension-for-tensorflow==1.2.0
ERROR: Could not find a version that satisfies the requirement intel-extension-for-tensorflow==1.2.0 (from versions: 0.0.0.dev1, 2.13.0.0, 2.13.0.1, 2.14.0.0, 2.14.0.1, 2.14.0.2)
ERROR: No matching distribution found for intel-extension-for-tensorflow==1.2.0

So,I can only install ITEX 2.13.0

This is my env

# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Smdbmds
  Model name:            Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz
    BIOS Model name:     3.0  CPU @ 2.0GHz
    BIOS CPU family:     1
    CPU family:          6
    Model:               94
    Thread(s) per core:  1
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            3
    BogoMIPS:            4988.26
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht sysca
                         ll nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse
                         4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefe
                         tch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsave
                         opt xsavec xgetbv1 arat

I change a new env:

# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  2
  On-line CPU(s) list:   0,1
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Smdbmds
  Model name:            Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
    BIOS Model name:     3.0
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            5
    BogoMIPS:            4988.28
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1g
                         b rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe 
                         popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1
                          hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsav
                         eopt xsavec xgetbv1 arat avx512_vnni

and same error

# pip list | grep tensorflow
intel-extension-for-tensorflow     1.2.0
intel-extension-for-tensorflow-lib 1.2.0.0
tensorflow                         2.12.0
tensorflow-estimator               2.12.0
tensorflow-io-gcs-filesystem       0.36.0

# python -c "import intel_extension_for_tensorflow as itex; print(itex.__version__)"
2024-02-27 16:38:19.340349: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-27 16:38:19.478901: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-27 16:38:20.154571: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-27 16:38:20.155380: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-27 16:38:21.379435: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-27 16:38:22.219570: E itex/core/kernels/xpu_kernel.cc:38] XPU-GPU kernel not supported.
If you need help, create an issue at https://github.com/intel/intel-extension-for-tensorflow/issues
2024-02-27 16:38:22.301632: E itex/core/kernels/xpu_kernel.cc:38] XPU-GPU kernel not supported.
If you need help, create an issue at https://github.com/intel/intel-extension-for-tensorflow/issues
2024-02-27 16:38:22.302928: F itex/core/utils/op_kernel.cc:54] Check failed: false Multiple KernelCreateFunc registration
If you need help, create an issue at https://github.com/intel/intel-extension-for-tensorflow/issues
Aborted (core dumped)

Thanks for reporting this. Let me try to reproduce on my end and get back to you

Hello, could you please try the latest 2.14 release intel extension for TensorFlow https://github.com/intel/intel-extension-for-tensorflow/releases/tag/v2.14.0.1? This release is verified on 2ed Gen Xeon scalable processors.

Many thanks!

YuningQiu

Hi @YuningQiu , same error

# python -c "import intel_extension_for_tensorflow as itex; print(itex.__version__)"
2024-02-28 01:55:04.519696: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-28 01:55:04.561273: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-28 01:55:04.561313: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-28 01:55:04.561358: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-28 01:55:04.569526: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-28 01:55:04.569760: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-28 01:55:05.598361: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Illegal instruction (core dumped)

update already

# pip list | grep tensorflow
intel-extension-for-tensorflow     2.14.0.1
intel-extension-for-tensorflow-lib 2.14.0.1.0
tensorflow                         2.14.1
tensorflow-estimator               2.14.0
tensorflow-io-gcs-filesystem       0.36.0

[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: pip install --upgrade pip

Hello, I am not able to reproduce the issue on my side.

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz
Stepping: 7
CPU MHz: 1036.127
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

$ pip list |grep tensorflow
intel-extension-for-tensorflow 2.13.0.1
intel-extension-for-tensorflow-lib 2.13.0.1.0
tensorflow 2.13.0
tensorflow-estimator 2.13.0
tensorflow-io-gcs-filesystem 0.34.0

$ pip list
Package Version


absl-py 2.1.0
astunparse 1.6.3
cachetools 5.3.3
certifi 2024.2.2
charset-normalizer 3.3.2
flatbuffers 23.5.26
gast 0.4.0
google-auth 2.28.1
google-auth-oauthlib 1.0.0
google-pasta 0.2.0
grpcio 1.62.0
h5py 3.10.0
idna 3.6
importlib-metadata 7.0.1
intel-extension-for-tensorflow 2.13.0.1
intel-extension-for-tensorflow-lib 2.13.0.1.0
keras 2.13.1
libclang 16.0.6
Markdown 3.5.2
MarkupSafe 2.1.5
numpy 1.23.5
oauthlib 3.2.2
opt-einsum 3.3.0
packaging 23.2
pip 24.0
pkg_resources 0.0.0
protobuf 4.25.3
pyasn1 0.5.1
pyasn1-modules 0.3.0
requests 2.31.0
requests-oauthlib 1.3.1
rsa 4.9
setuptools 69.1.1
six 1.16.0
tensorboard 2.13.0
tensorboard-data-server 0.7.2
tensorflow 2.13.0
tensorflow-estimator 2.13.0
tensorflow-io-gcs-filesystem 0.34.0
termcolor 2.4.0
typing_extensions 4.5.0
urllib3 2.2.1
Werkzeug 3.0.1
wheel 0.42.0
wrapt 1.16.0
zipp 3.17.0

$ python -c "import intel_extension_for_tensorflow as itex; print(itex.version)"
2024-02-29 09:39:12.960106: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-02-29 09:39:13.001971: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-29 09:39:13.660827: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-29 09:39:14.011651: I itex/core/wrapper/itex_cpu_wrapper.cc:42] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-02-29 09:39:14.049942: W itex/core/ops/op_init.cc:58] Op: _QuantizedMaxPool3D is already registered in Tensorflow
2.13.0.1

Could you please try upgrade your pip version and try "pip install --upgrade intel-extension-for-tensorflow[cpu]" again?

Hello, I am not able to reproduce the issue on my side.

$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz Stepping: 7 CPU MHz: 1036.127 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0-23,48-71 NUMA node1 CPU(s): 24-47,72-95 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

$ pip list |grep tensorflow intel-extension-for-tensorflow 2.13.0.1 intel-extension-for-tensorflow-lib 2.13.0.1.0 tensorflow 2.13.0 tensorflow-estimator 2.13.0 tensorflow-io-gcs-filesystem 0.34.0

$ pip list Package Version

absl-py 2.1.0 astunparse 1.6.3 cachetools 5.3.3 certifi 2024.2.2 charset-normalizer 3.3.2 flatbuffers 23.5.26 gast 0.4.0 google-auth 2.28.1 google-auth-oauthlib 1.0.0 google-pasta 0.2.0 grpcio 1.62.0 h5py 3.10.0 idna 3.6 importlib-metadata 7.0.1 intel-extension-for-tensorflow 2.13.0.1 intel-extension-for-tensorflow-lib 2.13.0.1.0 keras 2.13.1 libclang 16.0.6 Markdown 3.5.2 MarkupSafe 2.1.5 numpy 1.23.5 oauthlib 3.2.2 opt-einsum 3.3.0 packaging 23.2 pip 24.0 pkg_resources 0.0.0 protobuf 4.25.3 pyasn1 0.5.1 pyasn1-modules 0.3.0 requests 2.31.0 requests-oauthlib 1.3.1 rsa 4.9 setuptools 69.1.1 six 1.16.0 tensorboard 2.13.0 tensorboard-data-server 0.7.2 tensorflow 2.13.0 tensorflow-estimator 2.13.0 tensorflow-io-gcs-filesystem 0.34.0 termcolor 2.4.0 typing_extensions 4.5.0 urllib3 2.2.1 Werkzeug 3.0.1 wheel 0.42.0 wrapt 1.16.0 zipp 3.17.0

$ python -c "import intel_extension_for_tensorflow as itex; print(itex.version)" 2024-02-29 09:39:12.960106: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-29 09:39:13.001971: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-29 09:39:13.660827: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-29 09:39:14.011651: I itex/core/wrapper/itex_cpu_wrapper.cc:42] Intel Extension for Tensorflow* AVX512 CPU backend is loaded. 2024-02-29 09:39:14.049942: W itex/core/ops/op_init.cc:58] Op: _QuantizedMaxPool3D is already registered in Tensorflow 2.13.0.1

Could you please try upgrade your pip version and try "pip install --upgrade intel-extension-for-tensorflow[cpu]" again?

Hi @YuningQiu , This is my list

# pip list
Package                            Version
---------------------------------- ----------
absl-py                            2.1.0
astunparse                         1.6.3
cachetools                         5.3.3
certifi                            2024.2.2
charset-normalizer                 3.3.2
flatbuffers                        23.5.26
gast                               0.5.4
google-auth                        2.28.1
google-auth-oauthlib               1.0.0
google-pasta                       0.2.0
grpcio                             1.62.0
h5py                               3.10.0
idna                               3.6
importlib-metadata                 7.0.1
intel-extension-for-tensorflow     2.14.0.2
intel-extension-for-tensorflow-lib 2.14.0.2.0
keras                              2.14.0
libclang                           16.0.6
Markdown                           3.5.2
MarkupSafe                         2.1.5
ml-dtypes                          0.2.0
numpy                              1.24.4
oauthlib                           3.2.2
opt-einsum                         3.3.0
packaging                          23.2
pip                                24.0
protobuf                           4.23.4
pyasn1                             0.5.1
pyasn1-modules                     0.3.0
requests                           2.31.0
requests-oauthlib                  1.3.1
rsa                                4.9
setuptools                         68.0.0
six                                1.16.0
tensorboard                        2.14.1
tensorboard-data-server            0.7.2
tensorflow                         2.14.1
tensorflow-estimator               2.14.0
tensorflow-io-gcs-filesystem       0.36.0
termcolor                          2.4.0
typing_extensions                  4.10.0
urllib3                            2.2.1
Werkzeug                           3.0.1
wheel                              0.42.0
wrapt                              1.14.1
zipp                               3.17.0

maybe instruction does not support ?

# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Smdbmds
  Model name:            Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz
    BIOS Model name:     3.0  CPU @ 2.0GHz
    BIOS CPU family:     1
    CPU family:          6
    Model:               94
    Thread(s) per core:  1
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            3
    BogoMIPS:            4988.26
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
                          pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2
                          x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_sin
                         gle pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 a
                         rat

Hello, I am not able to reproduce the issue on my side.
$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz Stepping: 7 CPU MHz: 1036.127 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0-23,48-71 NUMA node1 CPU(s): 24-47,72-95 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
$ pip list |grep tensorflow intel-extension-for-tensorflow 2.13.0.1 intel-extension-for-tensorflow-lib 2.13.0.1.0 tensorflow 2.13.0 tensorflow-estimator 2.13.0 tensorflow-io-gcs-filesystem 0.34.0
$ pip list Package Version
absl-py 2.1.0 astunparse 1.6.3 cachetools 5.3.3 certifi 2024.2.2 charset-normalizer 3.3.2 flatbuffers 23.5.26 gast 0.4.0 google-auth 2.28.1 google-auth-oauthlib 1.0.0 google-pasta 0.2.0 grpcio 1.62.0 h5py 3.10.0 idna 3.6 importlib-metadata 7.0.1 intel-extension-for-tensorflow 2.13.0.1 intel-extension-for-tensorflow-lib 2.13.0.1.0 keras 2.13.1 libclang 16.0.6 Markdown 3.5.2 MarkupSafe 2.1.5 numpy 1.23.5 oauthlib 3.2.2 opt-einsum 3.3.0 packaging 23.2 pip 24.0 pkg_resources 0.0.0 protobuf 4.25.3 pyasn1 0.5.1 pyasn1-modules 0.3.0 requests 2.31.0 requests-oauthlib 1.3.1 rsa 4.9 setuptools 69.1.1 six 1.16.0 tensorboard 2.13.0 tensorboard-data-server 0.7.2 tensorflow 2.13.0 tensorflow-estimator 2.13.0 tensorflow-io-gcs-filesystem 0.34.0 termcolor 2.4.0 typing_extensions 4.5.0 urllib3 2.2.1 Werkzeug 3.0.1 wheel 0.42.0 wrapt 1.16.0 zipp 3.17.0
$ python -c "import intel_extension_for_tensorflow as itex; print(itex.version)" 2024-02-29 09:39:12.960106: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-29 09:39:13.001971: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-29 09:39:13.660827: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-29 09:39:14.011651: I itex/core/wrapper/itex_cpu_wrapper.cc:42] Intel Extension for Tensorflow* AVX512 CPU backend is loaded. 2024-02-29 09:39:14.049942: W itex/core/ops/op_init.cc:58] Op: _QuantizedMaxPool3D is already registered in Tensorflow 2.13.0.1
Could you please try upgrade your pip version and try "pip install --upgrade intel-extension-for-tensorflow[cpu]" again?

Hi @YuningQiu , This is my list

# pip list
Package                            Version
---------------------------------- ----------
absl-py                            2.1.0
astunparse                         1.6.3
cachetools                         5.3.3
certifi                            2024.2.2
charset-normalizer                 3.3.2
flatbuffers                        23.5.26
gast                               0.5.4
google-auth                        2.28.1
google-auth-oauthlib               1.0.0
google-pasta                       0.2.0
grpcio                             1.62.0
h5py                               3.10.0
idna                               3.6
importlib-metadata                 7.0.1
intel-extension-for-tensorflow     2.14.0.2
intel-extension-for-tensorflow-lib 2.14.0.2.0
keras                              2.14.0
libclang                           16.0.6
Markdown                           3.5.2
MarkupSafe                         2.1.5
ml-dtypes                          0.2.0
numpy                              1.24.4
oauthlib                           3.2.2
opt-einsum                         3.3.0
packaging                          23.2
pip                                24.0
protobuf                           4.23.4
pyasn1                             0.5.1
pyasn1-modules                     0.3.0
requests                           2.31.0
requests-oauthlib                  1.3.1
rsa                                4.9
setuptools                         68.0.0
six                                1.16.0
tensorboard                        2.14.1
tensorboard-data-server            0.7.2
tensorflow                         2.14.1
tensorflow-estimator               2.14.0
tensorflow-io-gcs-filesystem       0.36.0
termcolor                          2.4.0
typing_extensions                  4.10.0
urllib3                            2.2.1
Werkzeug                           3.0.1
wheel                              0.42.0
wrapt                              1.14.1
zipp                               3.17.0

maybe instruction does not support ?

# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Smdbmds
  Model name:            Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz
    BIOS Model name:     3.0  CPU @ 2.0GHz
    BIOS CPU family:     1
    CPU family:          6
    Model:               94
    Thread(s) per core:  1
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            3
    BogoMIPS:            4988.26
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
                          pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2
                          x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_sin
                         gle pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 a
                         rat

@YuningQiu , actually, same err in my new env :

absl-py                            2.1.0
astunparse                         1.6.3
cachetools                         5.3.3
certifi                            2024.2.2
charset-normalizer                 3.3.2
flatbuffers                        23.5.26
gast                               0.4.0
google-auth                        2.28.1
google-auth-oauthlib               1.0.0
google-pasta                       0.2.0
grpcio                             1.62.0
h5py                               3.10.0
idna                               3.6
importlib-metadata                 7.0.1
intel-extension-for-tensorflow     2.13.0.1
intel-extension-for-tensorflow-lib 2.13.0.1.0
keras                              2.13.1
libclang                           16.0.6
Markdown                           3.5.2
MarkupSafe                         2.1.5
numpy                              1.23.5
oauthlib                           3.2.2
opt-einsum                         3.3.0
packaging                          23.2
pip                                24.0
protobuf                           4.25.3
pyasn1                             0.5.1
pyasn1-modules                     0.3.0
requests                           2.31.0
requests-oauthlib                  1.3.1
rsa                                4.9
setuptools                         53.0.0
six                                1.16.0
tensorboard                        2.13.0
tensorboard-data-server            0.7.2
tensorflow                         2.13.0
tensorflow-estimator               2.13.0
tensorflow-io-gcs-filesystem       0.36.0
termcolor                          2.4.0
typing_extensions                  4.5.0
urllib3                            2.2.1
Werkzeug                           3.0.1
wheel                              0.42.0
wrapt                              1.16.0
zipp                               3.17.0
# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  2
  On-line CPU(s) list:   0,1
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Smdbmds
  Model name:            Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
    BIOS Model name:     3.0
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            5
    BogoMIPS:            4988.28
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
                          pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2
                          x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_sin
                         gle pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx
                         512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni

this is error

# python -c "import intel_extension_for_tensorflow as itex; print(itex.version)"
2024-02-29 10:19:32.352107: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-29 10:19:32.457549: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-29 10:19:33.124946: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-29 10:19:33.125560: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-29 10:19:34.277521: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-29 10:19:35.751851: I itex/core/wrapper/itex_cpu_wrapper.cc:42] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-02-29 10:19:35.809138: W itex/core/ops/op_init.cc:58] Op: _QuantizedMaxPool3D is already registered in Tensorflow
2024-02-29 10:19:35.834854: F itex/core/utils/op_kernel.cc:54] Check failed: false Multiple KernelCreateFunc registration
If you need help, create an issue at https://github.com/intel/intel-extension-for-tensorflow/issues
Aborted (core dumped)

Hello, I think SKU 6133 CPUs are not supported. Latest ITEX should support starting from 2nd GEN.

Could you please collect the gdb back trace using this command so that we can get more information?
$ gdb --args python -c "import intel_extension_for_tensorflow as itex; print(itex.version)"

Hello, I think SKU 6133 CPUs are not supported. Latest ITEX should support starting from 2nd GEN.

Could you please collect the gdb back trace using this command so that we can get more information? $ gdb --args python -c "import intel_extension_for_tensorflow as itex; print(itex.version)"

This is back trace

Thread 1 "python" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44	      return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-4.ocs23.x86_64 libb2-0.98.1-2.ocs23.x86_64 libffi-3.4.4-2.ocs23.x86_64 libgcc-12.3.1-2.ocs23.x86_64 libgomp-12.3.1-2.ocs23.x86_64 libstdc++-12.3.1-2.ocs23.x86_64 mpdecimal-2.5.1-4.ocs23.x86_64 openssl-libs-3.0.12-2.ocs23.x86_64 python3-libs-3.11.6-2.ocs23.x86_64 xz-libs-5.4.4-1.ocs23.x86_64 zlib-1.2.13-4.ocs23.x86_64
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff708cff3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff703da26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff702687c in __GI_abort () at abort.c:79
#4  0x00007fff89b11567 in itex::internal::LogMessageFatal::~LogMessageFatal() ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow-plugins/../intel_extension_for_tensorflow/libitex_cpu_internal_avx2.so
#5  0x00007fff89b28a77 in itex::OpTypeFactory::RegisterOpType(void* (*)(TF_OpKernelConstruction*), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [clone .cold] ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow-plugins/../intel_extension_for_tensorflow/libitex_cpu_internal_avx2.so
#6  0x00007fff89b2d280 in itex::Name::Build(char const*, char const*) ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow-plugins/../intel_extension_for_tensorflow/libitex_cpu_internal_avx2.so
#7  0x00007fff889309aa in itex::Register1(char const*, char const*) ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow-plugins/../intel_extension_for_tensorflow/libitex_cpu_internal_avx2.so
#8  0x00007fff89b25819 in itex::register_kernel::RegisterCPUKernels(char const*) ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow-plugins/../intel_extension_for_tensorflow/libitex_cpu_internal_avx2.so
#9  0x00007fff8892ffbb in TF_InitKernel_Internal ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow-plugins/../intel_extension_for_tensorflow/libitex_cpu_internal_avx2.so
#10 0x00007fff9274875d in TF_InitKernel ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow-plugins/libitex_cpu.so
#11 0x00007ffff24cc016 in tensorflow::RegisterPluggableDevicePlugin(void*) ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow/python/platform/../../libtensorflow_cc.so.2
#12 0x00007ffff24c5fbc in TF_LoadPluggableDeviceLibrary ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow/python/platform/../../libtensorflow_cc.so.2
#13 0x00007fff9959617d in pybind11::cpp_function::initialize<pybind11_init__pywrap_tf_session(pybind11::module_&)::$_66, TF_Library*, char const*, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::return_value_policy>(pybind11_init__pywrap_tf_session(pybind11::module_&)::$_66&&, TF_Library* (*)(char const*), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::return_value_policy const&)::{lambda(pybind11::detail::function_call&)#1}::__invoke(pybind11::detail::function_call&) ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow/python/client/_pywrap_tf_session.so
#14 0x00007fff9955ad58 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
   from /home/tensorflow/v_tensorflow/lib64/python3.11/site-packages/tensorflow/python/client/_pywrap_tf_session.so
#15 0x00007ffff75d51f1 in cfunction_call () from /lib64/libpython3.11.so.1.0
#16 0x00007ffff75b7713 in _PyObject_MakeTpCall () from /lib64/libpython3.11.so.1.0
#17 0x00007ffff75c0217 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#18 0x00007ffff75bc1aa in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#19 0x00007ffff7645e16 in PyEval_EvalCode () from /lib64/libpython3.11.so.1.0
#20 0x00007ffff765cdd2 in builtin_exec () from /lib64/libpython3.11.so.1.0
#21 0x00007ffff75cdffa in cfunction_vectorcall_FASTCALL_KEYWORDS () from /lib64/libpython3.11.so.1.0
#22 0x00007ffff75c4747 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#23 0x00007ffff75bc1aa in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#24 0x00007ffff75d49b6 in object_vacall () from /lib64/libpython3.11.so.1.0
#25 0x00007ffff75f9a44 in PyObject_CallMethodObjArgs () from /lib64/libpython3.11.so.1.0
#26 0x00007ffff75f81dc in PyImport_ImportModuleLevelObject () from /lib64/libpython3.11.so.1.0
--Type <RET> for more, q to quit, c to continue without paging--c
#27 0x00007ffff75c59de in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#28 0x00007ffff75bc1aa in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#29 0x00007ffff7645e16 in PyEval_EvalCode () from /lib64/libpython3.11.so.1.0
#30 0x00007ffff765cdd2 in builtin_exec () from /lib64/libpython3.11.so.1.0
#31 0x00007ffff75cdffa in cfunction_vectorcall_FASTCALL_KEYWORDS () from /lib64/libpython3.11.so.1.0
#32 0x00007ffff75c4747 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#33 0x00007ffff75bc1aa in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#34 0x00007ffff75d49b6 in object_vacall () from /lib64/libpython3.11.so.1.0
#35 0x00007ffff75f9a44 in PyObject_CallMethodObjArgs () from /lib64/libpython3.11.so.1.0
#36 0x00007ffff75f81dc in PyImport_ImportModuleLevelObject () from /lib64/libpython3.11.so.1.0
#37 0x00007ffff75c59de in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#38 0x00007ffff75bc1aa in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#39 0x00007ffff7645e16 in PyEval_EvalCode () from /lib64/libpython3.11.so.1.0
#40 0x00007ffff7663d33 in run_eval_code_obj () from /lib64/libpython3.11.so.1.0
#41 0x00007ffff76602ba in run_mod () from /lib64/libpython3.11.so.1.0
#42 0x00007ffff7654bcd in PyRun_StringFlags () from /lib64/libpython3.11.so.1.0
#43 0x00007ffff7654920 in PyRun_SimpleStringFlags () from /lib64/libpython3.11.so.1.0
#44 0x00007ffff766f145 in Py_RunMain () from /lib64/libpython3.11.so.1.0
#45 0x00007ffff7635f9b in Py_BytesMain () from /lib64/libpython3.11.so.1.0
#46 0x00007ffff7027f50 in __libc_start_call_main (main=main@entry=0x555555555160 <main>, argc=argc@entry=3, 
    argv=argv@entry=0x7fffffffe108) at ../sysdeps/nptl/libc_start_call_main.h:58
#47 0x00007ffff7028009 in __libc_start_main_impl (main=0x555555555160 <main>, argc=3, argv=0x7fffffffe108, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe0f8) at ../csu/libc-start.c:360
#48 0x0000555555555095 in _start ()

@wswsmao if you are using venv, please help to switch to conda.

@wswsmao if you are using venv, please help to switch to conda.

@guizili0 ok, i will try it

@wswsmao if you are using venv, please help to switch to conda.

@guizili0 ok, i will try it

Hi @guizili0, seem like just pip install in doc ?
https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/install_for_cpu.md

@wswsmao if you are using venv, please help to switch to conda.

@guizili0 ok, i will try it

Hi @guizili0, seem like just pip install in doc ? https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/install_for_cpu.md

conda cmd

# conda search  *intel-extension-for-tensorflow*
Loading channels: done

PackagesNotFoundError: The following packages are not available from current channels:

  - *intel-extension-for-tensorflow*

Current channels:

  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

@wswsmao Sorry for the confuse, I mean you can cerate env via conda and then use pip to install.

@wswsmao Sorry for the confuse, I mean you can cerate env via conda and then use pip to install.
Hi @guizili0 , it works. it is time to update the docs

(itex_build)  # python quick_example.py
2024-03-04 16:20:47.462221: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-04 16:20:47.464529: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-04 16:20:47.505677: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-04 16:20:47.505728: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-04 16:20:47.505787: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-04 16:20:47.513488: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-04 16:20:47.513739: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-04 16:20:48.300304: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-04 16:20:48.661969: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
tf.Tensor(
[[[[2.9566352 2.5960028 2.6510603]
   [2.5197754 2.3684092 2.145223 ]
   [3.125528  2.7386546 3.5465343]
   [3.4885104 3.9240358 4.0364857]
   [1.476208  1.5018826 1.921063 ]]

  [[2.8954098 2.9481623 3.6797826]
   [3.3527603 2.776656  3.0839703]
   [3.4763882 2.7880876 2.5138347]
   [3.317111  3.3032947 2.9439278]
   [2.3710513 2.5041685 2.1902466]]

  [[3.716969  3.6650152 2.9369717]
   [2.72032   2.8194175 2.781646 ]
   [2.4257572 2.911467  2.8563507]
   [2.8951228 2.3830342 3.1627011]
   [2.3537884 3.017113  2.3408718]]

  [[3.6247163 4.0131707 4.250465 ]
   [3.353158  2.9245052 3.3258662]
   [3.866764  3.136556  3.1926696]
   [3.713012  3.3164258 2.899124 ]
   [1.745955  2.5850582 2.0824847]]

  [[2.2759743 2.7298818 1.8404391]
   [1.7627361 2.185912  1.516212 ]
   [1.5935009 2.1497884 1.6038413]
   [1.689346  1.5855561 1.5915074]
   [1.0986044 1.2527758 1.0862353]]]], shape=(1, 5, 5, 3), dtype=float32)
Finished

@wswsmao In our validation, we also test venv, but did not reproduce this issue.
below is my dockerfile to reproduce this issue, but failed. Can you help to share your reproduce step? Thanks.

FROM ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive

HEALTHCHECK NONE

RUN ln -sf bash /bin/sh

RUN apt-get update && \
    apt-get install -y --no-install-recommends --fix-missing \
    wget \
    apt-utils \
    ca-certificates \
    git \
    vim \
    apt-transport-https curl gnupg \
    python-is-python3 \
    python3.10-venv \
    pip \
    gdb \
    strace \
    gpg && \
    apt-get clean && \
    rm -rf  /var/lib/apt/lists/*

RUN mkdir -p /test
RUN python -m venv /test/venv_test
RUN source /test/venv_test/bin/activate
RUN pip install --upgrade intel-extension-for-tensorflow[cpu]

output is:

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-03-05 03:21:25.951828: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environ
ment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-05 03:21:25.954911: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-05 03:21:26.000753: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-05 03:21:26.000794: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-05 03:21:26.000832: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-05 03:21:26.010442: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-05 03:21:26.010780: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-05 03:21:27.224707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-05 03:21:27.592314: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
>>>

ITEX
@guizili0 OK,this is my step. conda can work in the same env

# lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Smdbmds
  Model name:            Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz
    BIOS Model name:     3.0  CPU @ 2.0GHz
    BIOS CPU family:     1
    CPU family:          6
    Model:               94
    Thread(s) per core:  1
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            3
    BogoMIPS:            4988.26
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
                          syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma c
                         x16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf
                         _lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed a
                         dx smap clflushopt xsaveopt xsavec xgetbv1 arat
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   512 KiB (16 instances)
  L1i:                   512 KiB (16 instances)
  L2:                    64 MiB (16 instances)
  L3:                    27.5 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Vulnerable
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling
  Srbds:                 Unknown: Dependent on hypervisor status
  Tsx async abort:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
# uname -a
Linux VM-33-248-tlinux 5.18.15-2207.2.0.ocks #1 SMP PREEMPT_DYNAMIC Wed Nov 9 11:41:31 CST 2022 x86_64 GNU/Linux
# cat /etc/os-release 
NAME="OpenCloudOS Stream"
VERSION="2301"
ID="opencloudos"
ID_LIKE="opencloudos"
VERSION_ID="2301"
PLATFORM_ID="platform:ocs2301"
PRETTY_NAME="OpenCloudOS Stream 2301"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:opencloudos:opencloudos:2301"
HOME_URL="https://www.opencloudos.org/"
BUG_REPORT_URL="https://bugs.opencloudos.tech/"


# python -m venv itex

# source itex/bin/activate

(itex) # pip install --upgrade intel-extension-for-tensorflow[cpu]
(itex) # pip list | grep tensorflow
intel-extension-for-tensorflow     2.14.0.2
intel-extension-for-tensorflow-lib 2.14.0.2.0
tensorflow                         2.14.1
tensorflow-estimator               2.14.0
tensorflow-io-gcs-filesystem       0.36.0

[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: pip install --upgrade pip

(itex) # python -c "import intel_extension_for_tensorflow as itex; print(itex.__version__)"
2024-03-05 06:28:43.384035: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-05 06:28:43.424872: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-05 06:28:43.424910: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-05 06:28:43.424949: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-05 06:28:43.432727: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-05 06:28:43.432951: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-05 06:28:44.449451: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-05 06:28:44.891640: I itex/core/wrapper/itex_cpu_wrapper.cc:70] Intel Extension for Tensorflow* AVX2 CPU backend is loaded.
2024-03-05 06:28:44.954756: F itex/core/utils/op_kernel.cc:54] Check failed: false Multiple KernelCreateFunc registration
If you need help, create an issue at https://github.com/intel/intel-extension-for-tensorflow/issues
Aborted (core dumped)
(itex) 

Hi @guizili0, I have run this demo in conda
https://github.com/intel/intel-extension-for-tensorflow/blob/main/examples/train_bert/README.md

there are many confuses:

  1. This demo can run without changing the original code, how to prove that ITEX is effective?
  2. how can I get performance improvement data compared with no ITEX? uninstall ITEX and re-run?

2024-03-04 16:20:48.661969: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.

@wswsmao

This demo can run without changing the original code, how to prove that ITEX is effective?
You can check the log. There is ITEX log when it is enabled.

2024-03-08 11:44:24.621184: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-08 11:44:25.439443: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-08 11:44:25.830716: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-03-08 11:44:26.371964: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2024-03-08 11:44:26.492596: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-03-08 11:44:26.492973: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-03-08 11:44:26.492984: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.

Tensorflow version 2.14.1
2024-03-08 11:44:27.254820: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-03-08 11:44:27.254869: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-03-08 11:44:27.254900: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: )
2024-03-08 11:44:27.255202: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:1 with 0 MB memory) -> physical PluggableDevice (device: 1, name: XPU, pci bus id: )
2024-03-08 11:44:27.809896: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type XPU is enabled.
Load cat.jpg to inference

  1. how can I get performance improvement data compared with no ITEX? uninstall ITEX and re-run?

I recommend to create two cond environments. One with ITEX another without ITEX.
You can compare the performance with/without ITEX.

2024-03-04 16:20:48.661969: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.

@wswsmao

This demo can run without changing the original code, how to prove that ITEX is effective?
You can check the log. There is ITEX log when it is enabled.

2024-03-08 11:44:24.621184: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-08 11:44:25.439443: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-03-08 11:44:25.830716: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded. 2024-03-08 11:44:26.371964: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded. 2024-03-08 11:44:26.492596: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero 2024-03-08 11:44:26.492973: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. 2024-03-08 11:44:26.492984: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. Tensorflow version 2.14.1 2024-03-08 11:44:27.254820: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-03-08 11:44:27.254869: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-03-08 11:44:27.254900: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: ) 2024-03-08 11:44:27.255202: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] **Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:1 with 0 MB memory) -> physical PluggableDevice (device: 1, name: XPU, pci bus id: **) 2024-03-08 11:44:27.809896: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type XPU is enabled. Load cat.jpg to inference

  1. how can I get performance improvement data compared with no ITEX? uninstall ITEX and re-run?

I recommend to create two cond environments. One with ITEX another without ITEX. You can compare the performance with/without ITEX.

@xiguiw ok, I will try it

@wswsmao if you are using venv, please help to switch to conda.

Hi @guizili0 , There is some reasons that I have to use python venv instead of conda,
I was wondering if there is any way, such as that pick up the key files in conda and copy them to python venv

@wswsmao if you are using venv, please help to switch to conda.

Hi @guizili0 , There is some reasons that I have to use python venv instead of conda, I was wondering if there is any way, such as that pick up the key files in conda and copy them to python venv

@wswsmao In my understanding, there is a soft-link venv can cause ITEX so file load twice, that would cause this crash issue. You can try to remove this soft-link, and check if the issue is gone.

To find the soft-link, you can use strace -e trace=open,openat python to dump the so file load log and check the detail location.
You would get logs like below:

openat(AT_FDCWD, "/opt/app-root/lib/python3.9/site-packages/tensorflow-plugins/libitex_cpu.so", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/opt/app-root/lib64/python3.9/site-packages/tensorflow-plugins/libitex_cpu.so", O_RDONLY|O_CLOEXEC) = 3

@wswsmao if you are using venv, please help to switch to conda.

Hi @guizili0 , There is some reasons that I have to use python venv instead of conda, I was wondering if there is any way, such as that pick up the key files in conda and copy them to python venv

@wswsmao In my understanding, there is a soft-link venv can cause ITEX so file load twice, that would cause this crash issue. You can try to remove this soft-link, and check if the issue is gone.

To find the soft-link, you can use strace -e trace=open,openat python to dump the so file load log and check the detail location. You would get logs like below:

openat(AT_FDCWD, "/opt/app-root/lib/python3.9/site-packages/tensorflow-plugins/libitex_cpu.so", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/opt/app-root/lib64/python3.9/site-packages/tensorflow-plugins/libitex_cpu.so", O_RDONLY|O_CLOEXEC) = 3

@guizili0 It works.
I get these:

./itex_venv/lib64/python3.11/site-packages/tensorflow-plugins/libitex_cpu.so
./itex_venv/lib/python3.11/site-packages/tensorflow-plugins/libitex_cpu.so

I remove the so in lib, though it is same so file rather than soft-link.