aws-samples/amazon-sagemaker-tensorflow-object-detection-api

No gpu detected

dragynir opened this issue · 1 comments

Thank you in advance.

Can we use training Dockerfile to perform training on gpu?

When running on sagemaker gpu instance only cpu device is available:
Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.

The source code to check devices:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

I have updated the docker image to TF 2.5.0-gpu and GPU is utilized.
To check this, you can check the instance metrics in your training job.
It will take you to CloudWatch and you should see similar graph as mine:
gpu_utlization