sample quantize_inception_v3 run into segmentation fault
YanfeiXu opened this issue · 4 comments
YanfeiXu commented
OS: ubuntu23.04 docker container
Hardware: Xeon icelake
Components installed: oneAPI base toolkit, python3.9, pip, conda
Problem: Following the steps of sample quantize_inception_v3 however run into segmentation fault with jupyter. Then I converted the quantize_inception_v3.ipynb into .py file, and use ipython to run it, then also see segmentation fault.
(env_itex) (base) root@610430605d50:/intel-extension-for-tensorflow/examples/quantize_inception_v3# pip list |grep tensor intel-extension-for-tensorflow 2.13.0.0 intel-extension-for-tensorflow-lib 2.13.0.0.0 tensorboard 2.13.0 tensorboard-data-server 0.7.1 tensorflow 2.13.0 tensorflow-estimator 2.13.0 tensorflow-io-gcs-filesystem 0.33.0
ipython quantize_inception_v3.py ........ 23/23 [==============================] - 2s 79ms/step - loss: 0.3944 - accuracy: 0.8638 INFO:tensorflow:Assets written to: model_keras.fp32/assets 2023-08-09 12:24:52,637 - tensorflow - INFO - Assets written to: model_keras.fp32/assets Save model to model_keras.fp32 version: 1.0 model: name: inception_v3 framework: tensorflow_itex # possible values are tensorflow, mxnet and pytorch evaluation: accuracy: metric: topk: 1 # built-in metrics are topk, map, f1, allow user to register new metric. tuning: accuracy_criterion: relative: 0.01 # the tuning target of accuracy loss percentage: 2% exit_policy: timeout: 0 # tuning timeout (seconds) random_seed: 100 # random seed Found 3670 files belonging to 5 classes. Using 2936 files for training. Using 734 files for validation. 2023-08-09 12:24:53.768187: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type CPU is enabled. 2023-08-09 12:24:53.777478: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type CPU is enabled. 2023-08-09 12:25:07 [WARNING] Output tensor names should not be empty. 2023-08-09 12:25:07 [WARNING] Input tensor names is empty. INFO:tensorflow:Assets written to: /tmp/tmps4ia3yd9/assets 2023-08-09 12:25:25,183 - tensorflow - INFO - Assets written to: /tmp/tmps4ia3yd9/assets 2023-08-09 12:25:31.852167: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-09 12:25:31.852320: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session 2023-08-09 12:25:34.215201: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-09 12:25:34.215403: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session 2023-08-09 12:25:36.411005: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-09 12:25:36.411114: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session 2023-08-09 12:25:56 [INFO] ConvertLayoutOptimizer elapsed time: 0.41 ms 2023-08-09 12:25:56 [INFO] Pass ConvertPlaceholderToConst elapsed time: 33.94 ms 2023-08-09 12:25:56 [INFO] Pass SwitchOptimizer elapsed time: 32.33 ms Segmentation fault (core dumped)
guizili0 commented
@YanfeiXu can you help to try on ubuntu 22 to check if this issue from OS side or ITEX side, thanks.
YanfeiXu commented
@YanfeiXu can you help to try on ubuntu 22 to check if this issue from OS side or ITEX side, thanks.
Hi, I checked it. It also can easily reproduce on ubuntu22.
evaluation: accuracy: metric: topk: 1 # built-in metrics are topk, map, f1, allow user to register new metric. tuning: accuracy_criterion: relative: 0.01 # the tuning target of accuracy loss percentage: 2% exit_policy: timeout: 0 # tuning timeout (seconds) random_seed: 100 # random seed Found 3670 files belonging to 5 classes. Using 2936 files for training. Using 734 files for validation. 2023-08-13 02:06:37.926874: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type CPU is enabled. 2023-08-13 02:06:37.936141: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type CPU is enabled. 2023-08-13 02:06:50 [WARNING] Output tensor names should not be empty. 2023-08-13 02:06:50 [WARNING] Input tensor names is empty. INFO:tensorflow:Assets written to: /tmp/tmp9rt961oq/assets 2023-08-13 02:07:08,099 - tensorflow - INFO - Assets written to: /tmp/tmp9rt961oq/assets 2023-08-13 02:07:14.739487: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-13 02:07:14.739638: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session 2023-08-13 02:07:16.971706: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-13 02:07:16.971927: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session 2023-08-13 02:07:19.531589: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-13 02:07:19.531719: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session 2023-08-13 02:07:39 [INFO] ConvertLayoutOptimizer elapsed time: 0.41 ms 2023-08-13 02:07:39 [INFO] Pass ConvertPlaceholderToConst elapsed time: 34.35 ms 2023-08-13 02:07:39 [INFO] Pass SwitchOptimizer elapsed time: 30.84 ms Segmentation fault (core dumped) (itex_build) root@98026ad8adbc:/intel-extension-for-tensorflow/examples/quantize_inception_v3# cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy (itex_build) root@98026ad8adbc:/intel-extension-for-tensorflow/examples/quantize_inception_v3#
Dboyqiao commented
- @YanfeiXu Segmentation fault (core dumped) issue can not be reproduced with latest packages on Ubuntu22.04.
$ pip list | grep tensorflow
intel-extension-for-tensorflow 2.13.0.0
intel-extension-for-tensorflow-lib 2.13.0.0.0
tensorflow 2.13.0
tensorflow-estimator 2.13.0
tensorflow-io-gcs-filesystem 0.33.0
$ pip list | grep neural
neural-compressor 2.2.1
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
- This example is still failed for AssertionError exposed by neural_compressor, since it is based on neural_compressor old APIs (v1.XX), which are not applicable for the latest neural_compressor (v2.xx). @NeoZhangJianyu will update this example in our coming 2.14.0 release.
AssertionError:
File "/home/zhefengq/WORKSPACE/intel-extension-for-tensorflow-master/examples/quantize_inception_v3/env_itex/lib/python3.9/site-packages/neural_compressor/metric/metric.py", line 924, in update
preds, labels = _topk_shape_validate(preds, labels)
File "/home/zhefengq/WORKSPACE/intel-extension-for-tensorflow-master/examples/quantize_inception_v3/env_itex/lib/python3.9/site-packages/neural_compressor/metric/metric.py", line 458, in _topk_shape_validate
assert label_N == N, 'labels batch size should same with preds'
AssertionError: labels batch size should same with preds
2023-08-28 10:04:48 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit.