attempting to perform BLAS operation using StreamExecutor without BLAS support
chenyuyuyu opened this issue · 5 comments
Hi. I tried to test model with synthetic dataset.
python homography_CNN_synthetic.py --mode test --lr 5e-4 --loss_type h_loss
But there are something wrong....
How can I solve the error?
Thanks a lot.
<==================== Loading data ===================>
===> There are totally 5000 test files
===> Test: There are totally 5000 Test files
--Shape of A_mat: [64, 8, 8]
--shape of b: [64, 8, 1]
--shape of H_8el Tensor("MatrixSolve:0", shape=(64, 8, 1), dtype=float32, device=/device:GPU:0)
('--Inter- scale_h:', True)
--Shape of A_mat: [64, 8, 8]
--shape of b: [64, 8, 1]
--shape of H_8el Tensor("MatrixSolve_1:0", shape=(64, 8, 1), dtype=float32, device=/device:GPU:1)
('--Inter- scale_h:', True)
2019-11-08 13:13:32.405944: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2019-11-08 13:13:32.405991: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-11-08 13:13:32.406016: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2019-11-08 13:13:32.406028: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2019-11-08 13:13:32.406037: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2019-11-08 13:13:32.803078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.721
pciBusID 0000:03:00.0
Total memory: 10.91GiB
Free memory: 10.72GiB
2019-11-08 13:13:33.060704: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x555d0605e760 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-11-08 13:13:33.061707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.721
pciBusID 0000:04:00.0
Total memory: 10.91GiB
Free memory: 10.75GiB
2019-11-08 13:13:33.261950: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x555d06062cb0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-11-08 13:13:33.263073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 2 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:81:00.0
Total memory: 11.17GiB
Free memory: 11.09GiB
2019-11-08 13:13:33.484606: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x555d06067220 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-11-08 13:13:33.485397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 3 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.721
pciBusID 0000:82:00.0
Total memory: 10.91GiB
Free memory: 2.35GiB
2019-11-08 13:13:33.487296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 2
2019-11-08 13:13:33.487355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 3
2019-11-08 13:13:33.487403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 2
2019-11-08 13:13:33.487426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 3
2019-11-08 13:13:33.487440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 2 and 0
2019-11-08 13:13:33.487451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 2 and 1
2019-11-08 13:13:33.487462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 2 and 3
2019-11-08 13:13:33.487480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 3 and 0
2019-11-08 13:13:33.487499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 3 and 1
2019-11-08 13:13:33.487512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 3 and 2
2019-11-08 13:13:33.487569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 2 3
2019-11-08 13:13:33.487583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y Y N N
2019-11-08 13:13:33.487592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: Y Y N N
2019-11-08 13:13:33.487601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 2: N N Y N
2019-11-08 13:13:33.487610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 3: N N N Y
2019-11-08 13:13:33.487626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0)
2019-11-08 13:13:33.487638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0)
2019-11-08 13:13:33.487649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K40c, pci bus id: 0000:81:00.0)
2019-11-08 13:13:33.487659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0)
/home/chenxy/unsupervisedDeepHomographyRAL2018-master/main_log/docker_folder/post_estimation/models/synthetic_models/h_loss_normalize
Traceback (most recent call last):
File "homography_CNN_synthetic.py", line 597, in
test_homography()
File "homography_CNN_synthetic.py", line 584, in test_homography
test_obj.run_test(0)
File "homography_CNN_synthetic.py", line 517, in run_test
train_saver.restore(sess,tf.train.latest_checkpoint(args.model_dir))
File "/home/chenxy/anaconda3/envs/last27/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1548, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/chenxy/anaconda3/envs/last27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/chenxy/anaconda3/envs/last27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/chenxy/anaconda3/envs/last27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/chenxy/anaconda3/envs/last27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Unable to get element from the feed as bytes.
2019-11-08 13:13:34.661958: E tensorflow/stream_executor/cuda/cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-08 13:13:34.661984: E tensorflow/stream_executor/stream.cc:289] Error recording event in stream: error recording CUDA event on stream 0x555d07914c90: CUDA_ERROR_DEINITIALIZED; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-11-08 13:13:34.662042: W tensorflow/stream_executor/stream.cc:1601] attempting to perform BLAS operation using StreamExecutor without BLAS support
2019-11-08 13:13:34.662096: E tensorflow/stream_executor/cuda/cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-08 13:13:34.662136: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_DEINITIALIZED
2019-11-08 13:13:34.662169: W tensorflow/stream_executor/stream.cc:1601] attempting to perform BLAS operation using StreamExecutor without BLAS support
2019-11-08 13:13:34.662176: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
2019-11-08 13:13:34.662232: E tensorflow/stream_executor/cuda/cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-08 13:13:34.662292: W tensorflow/stream_executor/stream.cc:1601] attempting to perform BLAS operation using StreamExecutor without BLAS support
Aborted (core dumped)
Thanks very much for your reply! Several days ago , you said that it is recommended to slow down the learning rate.
Yesterday , I down the lr to 5e-5 to train the Unsupervised model.
python homography_CNN_synthetic.py --mode train --lr 5e-5 --loss_type l1_loss
But 15 hours later , it still return " input matrix is not invertible "
[=========================>.......................................] Step: 1m35s | Tot: 15h20m | Train: 1, h_loss 16.143, l1_loss 0.351759, l1_smooth_loss 0.152019-11-09 04:48:20.448954: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input matrix is not invertible.
Is there any other solution?
I think there are something wrong with my trained model.....
I think that when I finish training, it should generate some files in the folder "post_estimation/models/synthetic_models/h_loss_normalize". But the files are generated in "post_estimation/models/synthetic_models", and the folder "h_loss_normalize" has no items.
I just copied the model you provided to the folder “post_estimation/models/synthetic_models/l1_loss_normalize" and test the model using python homography_CNN_synthetic.py --mode test --lr 1e-4 --loss_type l1_loss
it run well.
[================================================================>] Step: 8s878ms | Tot: 1m54s | Test, h_loss 14.033, rec_loss 0.978, ssim_loss 0.406, l1_loss 120/120 ail_percent 0.1990
====> Result for RHO: 45 loss l1_loss noise 0.5
|Steps | h_loss | l1_loss | Fail percent |
119 14.033139856656392 0.6181761850913365 19.90234375
Top 0 - 30 %
Top 30 - 60 %
Top 60 - 100 %
===> Percentile Values: (20, 50, 80, 100):
[[13.0539875 0.35184437]
[13.884395 0.21537136]
[14.879062 0.5347835 ]]
======> End! ====================================
I successfully tested my trained model. By Manually put the generated files "h_loss_normalizemode.ckpt" into the folder "h_loss_normalize" . But why the number of generated files is 15. Your model folder only contain 3 ”ckpt “ files.
Your checkpoint is << model_checkpoint_path:"model.ckpt"
While mine is
model_checkpoint_path: "/home/chenxy/unsupervisedDeepHomographyRAL2018-master/main_log/docker_folder/post_estimation/models/synthetic_models/h_loss_normalizemodel.ckpt-149999"
all_model_checkpoint_paths: "/home/chenxy/unsupervisedDeepHomographyRAL2018-master/main_log/docker_folder/post_estimation/models/synthetic_models/h_loss_normalizemodel.ckpt-146000"
all_model_checkpoint_paths: "/home/chenxy/unsupervisedDeepHomographyRAL2018-master/main_log/docker_folder/post_estimation/models/synthetic_models/h_loss_normalizemodel.ckpt-147000"
all_model_checkpoint_paths: "/home/chenxy/unsupervisedDeepHomographyRAL2018-master/main_log/docker_folder/post_estimation/models/synthetic_models/h_loss_normalizemodel.ckpt-148000"
all_model_checkpoint_paths: "/home/chenxy/unsupervisedDeepHomographyRAL2018-master/main_log/docker_folder/post_estimation/models/synthetic_models/h_loss_normalizemodel.ckpt-149000"
all_model_checkpoint_paths: "/home/chenxy/unsupervisedDeepHomographyRAL2018-master/main_log/docker_folder/post_estimation/models/synthetic_models/h_loss_normalizemodel.ckpt-149999"
Does this have any bad effect on the test results?