Kali-Hac/TranSG

tensorflow and torch are right when verify the GPU and CUDA, but the code works on CPU only.

Closed this issue · 1 comments

I. device: rtx 3090, driver version: 515.86.01, cuda version: 11.7, python version=3.8.16
for tensorflow, I install the nvidia-tensorflow=1.15.5+nv22.05
and torch=1.13.0, torchvison=0.14.0, torchaudio=0.13.0

II. when I run print(tf.test.is_gpu_available()), it returns True
and I run print(torch.cuda.is_available()), also True.

III. However, in vscode I run python TranSG.py --dataset KGBD --probe probe,
it only work on CPU

there below are my running result, thank everybody for your help and reading!!!

python TranSG.py --dataset KGBD --probe probe
2023-03-31 14:48:35.072934: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:tensorflow:From TranSG.py:19: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From TranSG.py:19: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

----- Model hyperparams -----
f (sequence length): 6
H (embedding size): 128
SGT Layers: 2
FR heads: 8
alpha: 0.5
beta: 0.5
lambda: 0.5
a (structure): 10
b (trajectory): 2
t1: 0.07
t2: 14
batch_size: 256
lr: 0.00035
patience: 60
Mode: Train
----- Dataset Information -----
Dataset: KGBD
Probe: probe
WARNING:tensorflow:From TranSG.py:241: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From TranSG.py:241: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From TranSG.py:254: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From TranSG.py:254: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From TranSG.py:254: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From TranSG.py:254: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From TranSG.py:261: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From TranSG.py:261: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From TranSG.py:269: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

WARNING:tensorflow:From TranSG.py:269: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

concat_features (Spatial) Tensor("TranSG/TranSG/concat_13:0", shape=(256, 6, 20, 128), dtype=float32)
WARNING:tensorflow:From TranSG.py:324: The name tf.losses.absolute_difference is deprecated. Please use tf.compat.v1.losses.absolute_difference instead.

WARNING:tensorflow:From TranSG.py:324: The name tf.losses.absolute_difference is deprecated. Please use tf.compat.v1.losses.absolute_difference instead.

WARNING:tensorflow:From TranSG.py:416: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From TranSG.py:416: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From TranSG.py:417: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From TranSG.py:417: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From TranSG.py:417: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

WARNING:tensorflow:From TranSG.py:417: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

WARNING:tensorflow:From TranSG.py:419: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From TranSG.py:419: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2023-03-31 14:48:38.133041: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 3187200000 Hz
2023-03-31 14:48:38.133485: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1dcd43d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-03-31 14:48:38.133496: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2023-03-31 14:48:38.133969: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-03-31 14:48:38.171983: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-31 14:48:38.172172: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1dd4e680 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-03-31 14:48:38.172187: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2023-03-31 14:48:38.172296: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-31 14:48:38.172352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.755
pciBusID: 0000:01:00.0
2023-03-31 14:48:38.172363: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-03-31 14:48:38.172390: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-03-31 14:48:38.185206: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-03-31 14:48:38.185354: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-03-31 14:48:38.185606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2023-03-31 14:48:38.185940: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2023-03-31 14:48:38.185963: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-03-31 14:48:38.185999: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-31 14:48:38.186064: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-31 14:48:38.186099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2023-03-31 14:48:38.186112: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-03-31 14:48:38.287833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-03-31 14:48:38.287856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2023-03-31 14:48:38.287860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2023-03-31 14:48:38.288022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-31 14:48:38.288108: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-31 14:48:38.288158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19367 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2023-03-31 14:48:40.127634: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-03-31 14:48:40.467065: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
TranSG.py:610: UserWarning: This overload of addmm_ is deprecated:
addmm_(Number beta, Number alpha, Tensor mat1, Tensor mat2)
Consider using one of the following signatures instead:
addmm_(Tensor mat1, Tensor mat2, *, Number beta, Number alpha) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:1420.)
dist_m.addmm_(1, -2, a, b.t())
[0] Batch num: 0 | STPR Loss: 0.10349 | GPC Loss: 200.80025 |
[0] Batch num: 20 | STPR Loss: 0.09568 | GPC Loss: 83.03362 |
[0] Batch num: 40 | STPR Loss: 0.09093 | GPC Loss: 55.42593 |
[0] Batch num: 60 | STPR Loss: 0.08801 | GPC Loss: 41.22432 |
[0] Batch num: 80 | STPR Loss: 0.08391 | GPC Loss: 34.67684 |
[0] Batch num: 100 | STPR Loss: 0.08058 | GPC Loss: 29.46987 |
[0] Batch num: 120 | STPR Loss: 0.07752 | GPC Loss: 26.16352 |
ReID_Models/KGBD/probe_f_6_layers_2_heads_8_alpha_0.5_beta_0.5_lambda_0.5/best.ckpt
[Probe Evaluation] KGBD - probe | Top-1: 0.3280 (0.3280) | Top-5: 0.5291 (0.5291) | Top-10: 0.6189 (0.6189) | mAP: 0.0492 (0.0492) |
0.3280-0.5291-0.6189-0.0492
[1] Batch num: 0 | STPR Loss: 0.07678 | GPC Loss: 119.31063 |

I also want to know.