cambridge-mlg/cnaps

About the training time

Opened this issue · 9 comments

Hi, thank you for sharing your work! These days, I also have tried some experiments on Meta-data set by using your code. I have used a RTX3090 to train a model only on ImageNet, but found that per 500 episodes takes about 38 minutes, which is very solw. Is this speed reasonable?

jfb54 commented

No, that speed is not reasonable - especially on an NVidia 3090. It should be at least 10x faster than that. By any chance, are you using TensorFlow v2? If you are, the solution is to use the TensorFlow 1.15.

Some explanation:
To ensure that results are comparable, we use the Meta-Dataset reader supplied by the Meta-Dataset team at https://github.com/google-research/meta-dataset. This reader is built using TensorFlow v1 APIs and is resource hungry in terms of the number of files it opens and CPU memory that it uses. While TensorFlow v2 can emulate v1 APIs, it turns out to be considerably slower.

Let me know if this helps. If not, we can help you find the root cause.

No, that speed is not reasonable - especially on an NVidia 3090. It should be at least 10x faster than that. By any chance, are you using TensorFlow v2? If you are, the solution is to use the TensorFlow 1.15.

Some explanation: To ensure that results are comparable, we use the Meta-Dataset reader supplied by the Meta-Dataset team at https://github.com/google-research/meta-dataset. This reader is built using TensorFlow v1 APIs and is resource hungry in terms of the number of files it opens and CPU memory that it uses. While TensorFlow v2 can emulate v1 APIs, it turns out to be considerably slower.

Let me know if this helps. If not, we can help you find the root cause.
Hi, I have used TensorFlow1.15 before, I tried again just now by using the command "python run_cnaps.py --feature_adaptation film -i 20000 -lr 0.001 --batch_normalization task_norm-i --data_path /my _data_path" , but get similar results. But there are some Strange warnings:
"OMP: Info #171: KMP_AFFINITY: 05 proc 77 maps to package 1 core 5 thread 1
OMP: Info #171: KMP_AFFINITY: os proc 30 maps to package 1 core 8 thread o
OMP: Info #171: KMP_AFFINITY: 0S proc 78 maps to package 1 core 8 thread 1
OMP: Info #171: KMP_AFFINITY: oS proc 31 maps to package 1 core 9 thread 0
OMP: Info #171: KMP_AFFINITY: 0s proc 79 maps to package 1 core 9 thread 1
OMP: Info #171: KMP_AFFINITY: os proc 32 naps to package 1 core 10 thread 0
OMP: Info #171: KMP_AFFINITY: os proc 80 naps to package 1 core 10 thread 1
OMP: Info #171: KNP_AFFINITY: os proc 33 naps to package 1 core 11 thread 0
OMP: Info #171: KMP_AFFINITY: 0S proc 81 maps to package 1 core 11 thread 1
OMP: Info #171: KNP_AFFINITY: os proc 34 maps to package 1 core 12 thread 0
OMP: Info #171: KNP_AFFINITY: 0S proc 82 naps to package 1 core 12 thread 1
OMP: Info #171: KMP_AFFINITY: os proc 35 naps to package 1 core 13 thread 0
OMP: Info #171: KNP_AFFINITY: oS proc 83 naps to package 1 core 13 thread 1
OMP: Info #171: KNP_AFFINITY: os proc 36 naps to package 1 core 16 thread 0
OMP: Info #171: KMP_AFFINITY: 05 proc 84 naps to package 1 core 16 thread 1
OMP: Info #171: KMP_AFFINITY: 0S proc 37 naps to package 1 core 17 thread 0
OMP: Info #171: KMP_AFFINITY: 0S proc 85 naps to package 1 core 17 thread 1
OMP: Info #171: KNP_AFFINITY: os proc 38 naps to package 1 core 18 thread 0
OMP: Info #171: KNP_AFFINITY: 0s proc 86 naps to package 1 core 18 thread 1
OMP: Info #171: KNP_AFFINITY: os proc 39 maps to package 1 core 19 thread 0
OMP:Info #171: KMP_AFFINITY: os proc 87 maps to package 1 core 19 thread 1
OMP: Info #171: KMP_AFFINITY: os proc 40 naps to package 1 core 20 thread 0
OMP: Info #171: KMP_AFFINITY: 0S proc 88 naps to package 1 core 20 thread 1
OMP: Info #171: KMP_AFFINITY: oS proc 41 maps to package 1 core 21 thread 0
OMP: Info #171: KMP_AFFINITY: 0S proc 89 naps to package 1 core 21 thread 1
OMP: Info #171: KNMP_AFFINITY: os proc 42 maps to package 1 core 24 thread 0
ONP: Info #171: KNP_AFFINITY: oS proc 90 naps to package 1 core 24 thread 1
OMP: Info #171: KNP_AFFINITY: os proc 43 naps to package 1 core 25 thread 0
OMP: Info #171: KMP_AFFINITY: os proc 91 naps
OMP: Info #250: KNP_AFFINITY: pid 137352 tid 137558 thread 1 bound to oS proc set 1
OMP: Info #250:KNP_AFFINITY: pid 137352 tid 137561 thread 2 bound to oS proc set 2
OMP: Info #250:KNP_AFFINITY: pid 137352 tid 137565 thread 3 bound to os proc set 3"
Sincerely hope you can help me solve this problem, thank you!

jfb54 commented

These are just TensorFlow log messages. To silence them, see: https://stackoverflow.com/questions/57385766/disable-tensorflow-log-information

I strongly recommend using TensorFlow 1.15 when using the Meta-Dataset reader. CNAPs code will run considerably faster.

These are just TensorFlow log messages. To silence them, see: https://stackoverflow.com/questions/57385766/disable-tensorflow-log-information

i strongly recommend using TensorFlow 1.15 when using the Meta-Dataset reader. CNAPs code will run considerably faster.

Thank you for Quickly reply! I reconfirmed my tensorflow version and it was indeed 1.15. But I found that the GPU-util is very slow, it was 0 most of the time.

jfb54 commented

Yes, it sounds like the slowness is not in the GPU portion of the code, but in the data reading. Having more CPU memory, (both physical RAM and swap space) will help. When training with meta-dataset, I set my swap space to be 500 GB on an SSD drive. It also helps to have the meta-dataset data on an SSD as well.

ok, thank you! I'll try again.

Hi, I am sorry to bother you again.Again after your suggestion last time, I tried experimenting with larger SSD space,but the training speed still remains very slow(about 30min/500episodes). I printed the time of each process during training and found that the most time consuming part is the process of episode sampling. I looked up a lot of information about it, and some people said that it might be related to the version of Cuda, so could you please tell the version of Cuda and other software configurations you were training with?

jfb54 commented

No problem. We want to help you to get this to work. I just ran this myself using:
TensorFlow 1.15
PyTorch 1.10.2+cu113
CUDA 11.3
Python 3.7.12
NVIDIA 3090 GPU with 24 GB

I ran this command line:
python run_cnaps.py --feature_adaptation film -i 20000 -lr 0.001 --batch_normalization task_norm-i --data_path /scratch2/jfb54/tf-meta-dataset/records/ --dataset ilsvrc_2012 --test_datasets ilsvrc_2012

I changed line 311 in run_cnaps.py to:
use_two_gpus = False
as with 24 GB, you don't need two GPUs, there is enough memory on 1 GPU.

It took roughly 30 minutes to complete 2000 tasks.

Note that the first 1000 tasks are slower than subsequent ones as the reader is slow swapping in all the image data.

Hope this helps.

No problem. We want to help you to get this to work. I just ran this myself using: TensorFlow 1.15 PyTorch 1.10.2+cu113 CUDA 11.3 Python 3.7.12 NVIDIA 3090 GPU with 24 GB

I ran this command line: python run_cnaps.py --feature_adaptation film -i 20000 -lr 0.001 --batch_normalization task_norm-i --data_path /scratch2/jfb54/tf-meta-dataset/records/ --dataset ilsvrc_2012 --test_datasets ilsvrc_2012

I changed line 311 in run_cnaps.py to: use_two_gpus = False as with 24 GB, you don't need two GPUs, there is enough memory on 1 GPU.

It took roughly 30 minutes to complete 2000 tasks.

Note that the first 1000 tasks are slower than subsequent ones as the reader is slow swapping in all the image data.

Hope this helps.

ok, thank you very much!!