ROCm/ROCm-docker

Versioned Docker-Container

psychocoderHPC opened this issue · 5 comments

Please provide docker recipes where it is possible to select a ROCm version.
We try to integrate ROCm tests in of our projects PIConGPU, alpaka and cupla.
Currently, the recipes using e.g. for ubuntu apt repositories from AMD. The problem is that AMD is not providing the possibility to install an old version of ROCm via apt. apt is always shipping the last release.
If we only integrate the last release of ROCm into our CI it could be that AMD is updating the contains and on pull request in our projects is checked with one ROCm version and the other with a different version.
It could be possible that therefore CI tests failed where no HIP/ROCm code in our projects is touched because of a hidden version update of ROCm containers.

Short: Please provide docker recipes for each ROCm release that can be used even if there is a new ROCm release.

IMO there should be additional docker recipes available that build the full stack from source instead of pulling pre-build binaries.

It is possible to select a specific ROCm version by using the URL corresponding to that release from the repo. For Ubuntu, the available releases are listed here: http://repo.radeon.com/rocm/apt/

A user may choose to "lock" the ROCm release installed on their system, or in Docker by using one of these URLs in place of the main repo when setting sources during installation, e.g.:

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/3.9/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list

In my opinion, the version-tagged docker images should (probably) use the above versioned URLs to prevent versioning issues during apt upgrade, or when installing a ROCm package not included natively with the Docker image, etc.

@sunway513 -- I would consider this an enhancement to the current Docker structure, but I am less sure how the TF's / PT's of the world use Docker, and whether the above would match their expectation. What are your thoughts?

Hi @arghdos , thanks for the suggestions!
TF and PyT base docker container does point to the link to the exact ROCm release in their base Dockerfile, e.g. the latest ROCm4.0.1 docker container uses the following release link:
http://repo.radeon.com/rocm/apt/4.0.1/

And yes, it also makes sense to me to have the base dev dockerfiles hosted in this repo to point to versioned URLs. We'll try to adapt this change in the next release dockers.

I opened a pull request to update the ROCm documentation too.

Hi @psychocoderHPC , we have pushed out the ROCm4.1 docker container with links to versioned ROCm release repo below:
http://repo.radeon.com/rocm/apt/4.1/
I'm closing this issue for now, thanks a lot for your suggestions.