Upgrade cuda and cudnn versions in Docker image?

Question

Upgrade cuda and cudnn versions in Docker image?

deep-diver opened this issue a year ago · 9 comments

From TensorFlow 2.12, the recommended versions of cuda and cudnn are 11.8 and 8.6. However, it seems that the latest tfx docker image only supports 11.3 and 8.2. I think this legacy versions led some significant problems when I used KerasNLP as follows(in Trainer component):

Node: 'StatefulPartitionedCall'
RET_CHECK failure (tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:618) dnn != nullptr 
	 [[{{node StatefulPartitionedCall}}]] [Op:__inference_train_function_241897]

Any plan to update them? or any guides for people to update them?

Answer 1 · 2023-06-20T06:54:23.000Z

@deep-diver, Thank you for brining this up. We will discuss this internally and update this thread.

Answer 2 · 2023-06-20T07:00:31.000Z

@singhniraj08 Thank you! Really looking forward to the updates :)

Answer 3 · 2023-06-21T08:36:21.000Z

@deep-diver,

We internally discussed this issue and this is pending because of Python 3.10 support which is in progress. The image will be upgraded with updated dependencies once python 3.10 support is introduced.

We will update this thread once python 3.10 support is introduced and CUDA, CUDNN dependencies are updated.
Thank you.

Answer 4 · 2023-06-21T08:43:08.000Z

@singhniraj08 thanks for the updates

Can you share the ETA roughly?

Answer 5 · 2023-09-08T02:48:47.000Z

TFX 1.14.0 has been released, and it is now supporting Python 3.10.

Answer 6 · 2023-09-08T05:25:09.000Z

@briron thanks!
How about cuda depencies?

Answer 7 · 2023-09-08T05:37:46.000Z

@deep-diver,

Latest TFX docker image supports CUDA 11.8 and CUDNN 8.9. Please try the latest docker image and let us know if you face any issues. Thank you!

Answer 8 · 2023-09-16T01:45:28.000Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

Answer 9 · 2023-09-24T01:47:17.000Z

This issue was closed due to lack of activity after being marked stale for past 7 days.