NVIDIA/nvidia-docker

ARM64 Support

scosaje opened this issue · 44 comments

Is there a version of the nvidia-docker for arm platform. specifically for the NVidia Jetson TK1 and TX1

3XX0 commented

Not right now but we will probably look into it for nvidia-docker 2.0

I made the same request to some Nvidia Engineers during last GTC2016 in Amsterdam. I hope that this feature will come as soon as possible. It's very interesting to use container into TX1 som.

Have you succeeded running docker on TK1/TX1?

scosaje notifications@github.com于2016年10月10日 周一 22:16写道:

Is there a version of the nvidia-docker for arm platform. specifically for
the NVidia Jetson TK1 and TX1


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#214, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAsuvqY07VtRF0NwoiuS72T2xHSqpWJhks5qyki0gaJpZM4KSnMt
.

thanks guys. would it be possible to use linux "cgroups on mesos" on arm as
the alternate container in place of the nvidia-docker container while we
await the implementation of the nvidia-docker for arm?
Where cgroups will not suffice, I am willing to committing some funds for a
focused job on the nvidia-docker for arm (TK1/TX1), which can then be
donated to the community.

If anyone can, and is interested pls send mail direct to
sunny.osaje@gmail.com.

Thanks.

On Wed, Oct 12, 2016 at 11:54 AM, Gotrek77 notifications@github.com wrote:

I made the same request to some Nvidia Engineers during last GTC2016 in
Amsterdam. I hope that this feature will come as soon as possible. It's
very interesting to use container into TX1 som.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#214 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ASolKwDas1Z8in_m1ACz2ZZCo6o6tsIHks5qzLxtgaJpZM4KSnMt
.

@3XX0 Hi Jonathan, when do you think we will see ARM support?
Also, would this be able then to run the tensorflow docker images? Or do we need to rebuild?
Thanks

3XX0 commented

I can't really say, we have other priorities right now and the ARM support is somewhat tricky.
I started looking into it but our driver is dramatically different on ARM and we would need extensive changes in nvidia-docker. We also need to tweak the L4T kernel to make it container friendly.
I will update this issue if we make any progress.

Hello 3XX0, there is some news on nvidia-docker support to jetson tx1?

Thanks

Also interested in this topic.

@3XX0, can you describe in more detail what would be the changes needed for getting support for ARM 64 bits?. (targeting TX1) It seems this is a rather popular topic. My group has some bandwidth and wouldn't mind looking into this.

Thanks,

3XX0 commented

It's pretty complicated and it's going to take a lot of time.

First, we won't support ARM until the new 2.0 runtime becomes stable and works flawlessly on x86 (hopefully we will publish it soon). Secondly, we need to work with the L4T kernel team to support containers there. And lastly, we need to reimplement all the driver logic to work with the Tegra iGPU/dGPU stack bearing in mind that it needs to support all of our architectures (Drive PXs will probably be first) and that our Tegra drivers are drastically changing.

HI @3XX0, how can do to rise up priority to this feature? It's very important to us to have nvidia-docker on tx1.

Hello guys, are there any news about the nvidia-docker version that can support Jeston TK1?.
I am so interested in it.thank you.

Hello guys,
now that the 1.0.0 was released, when you think will start the work on 2.0.0 (with jetson tx1 support)? When will be possible to try (in alfa o beta) it?
Thanks in advance
Giuseppe

Hello guys,
any update?
Thanks
Giuseppe

I just got my TX2 and I am also interested in running everything as docker containers, so an update on this will be very helpful

flx42 commented

@Gotrek77 @GONZALORUIZ sorry but you're on your own for now, it's not on our near-term roadmap.

We are developing vision application running on top of docker environment and our customers are looking for local solution where TX2 will fit very nicely. Could you please please increase priority of this request?

@flx42 we Need this for our product. There are different people that ask to increase priority or ask how can push your management in this direction. It very important to know a roadmap to have this feature.
Thanks

flx42 commented

@flesicek @Gotrek77 I understand that this is causing an issue for you. However, there is nothing I can do right now, it's in the backlog but not on the roadmap for now.
You are on your own if you want to try to make it work.

Bump, because Docker would be great. Also looking for cloud GTU platform access API.

kgdad commented

As an alternative to nvidia-docker until official support is available, we were able to get Docker running on the TX-2. Need to do some kernel modifications and pass in some parameters to Docker containers so they have access to the GPU, but it is working for those that want to try it.

You can check out this GitHub repo for more information, Tegra-Docker

@kgdad Thanks for that repo, I own a TX1 and I'll work on adapting your approach for my device.
@flx42 @3XX0 once we have a docker version running on our L4T, do you have some pointers on how to adapt nvidia-docker to run on our devices? - my aim is to run minpy on TX1 from docker.

I installed docker on my TX1 today. I ran into a make issue with CONFIG_CGROUP_HUGETLB while building the custom kernel, so I omitted the optional CONFIG_CGROUP_HUGETLB change and the Image was built successfully. Docker images (e.g. FROM arm64v8/ubuntu:16.04) can now be used on the TX1.

The Tegra-Docker solution works (I have verified), but still feels a little like a work-around. Also, I believe the docker version (1.12.6 currently) they recommend is a little dated. It might be better to build from source or use a debian package. BTW, the Tegra-Docker solution sources/references the Jetsonhacks blog (link below) for custom kernel build instructions...

Tegra-Docker:
https://github.com/Technica-Corporation/Tegra-Docker

building custom kernels: TX1/TX2
http://www.jetsonhacks.com/2017/08/07/build-kernel-ttyacm-module-nvidia-jetson-tx1/
https://github.com/jetsonhacks/buildJetsonTX1Kernel

docker-ce arm64: stable
https://download.docker.com/linux/ubuntu/dists/xenial/pool/stable/arm64/

docker-ce install instructions: ubuntu
https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/

To check kernel compatibility with docker, run the check-compatibility.sh script from docker site:
https://docs.docker.com/engine/installation/linux/linux-postinstall/#chkconfig

-K


nvidia@tegra-ubuntu:~/bin$ bash check-config.sh
info: reading kernel config from /proc/config.gz ...

Generally Necessary:

  • cgroup hierarchy: properly mounted [/sys/fs/cgroup]
  • CONFIG_NAMESPACES: enabled
  • CONFIG_NET_NS: enabled
  • CONFIG_PID_NS: enabled
  • CONFIG_IPC_NS: enabled
  • CONFIG_UTS_NS: enabled
  • CONFIG_CGROUPS: enabled
  • CONFIG_CGROUP_CPUACCT: enabled
  • CONFIG_CGROUP_DEVICE: enabled
  • CONFIG_CGROUP_FREEZER: enabled
  • CONFIG_CGROUP_SCHED: enabled
  • CONFIG_CPUSETS: enabled
  • CONFIG_MEMCG: enabled
  • CONFIG_KEYS: enabled
  • CONFIG_VETH: enabled
  • CONFIG_BRIDGE: enabled
  • CONFIG_BRIDGE_NETFILTER: enabled (as module)
  • CONFIG_NF_NAT_IPV4: enabled
  • CONFIG_IP_NF_FILTER: enabled
  • CONFIG_IP_NF_TARGET_MASQUERADE: enabled
  • CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
  • CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
  • CONFIG_NETFILTER_XT_MATCH_IPVS: enabled
  • CONFIG_IP_NF_NAT: enabled
  • CONFIG_NF_NAT: enabled
  • CONFIG_NF_NAT_NEEDED: enabled
  • CONFIG_POSIX_MQUEUE: enabled
  • CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled

Optional Features:

  • CONFIG_USER_NS: enabled
  • CONFIG_SECCOMP: enabled
  • CONFIG_CGROUP_PIDS: enabled
  • CONFIG_MEMCG_SWAP: enabled
  • CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
  • CONFIG_MEMCG_KMEM: enabled
  • CONFIG_BLK_CGROUP: enabled
  • CONFIG_BLK_DEV_THROTTLING: enabled
  • CONFIG_IOSCHED_CFQ: enabled
  • CONFIG_CFQ_GROUP_IOSCHED: enabled
  • CONFIG_CGROUP_PERF: enabled
  • CONFIG_CGROUP_HUGETLB: missing
  • CONFIG_NET_CLS_CGROUP: enabled
  • CONFIG_CGROUP_NET_PRIO: enabled
  • CONFIG_CFS_BANDWIDTH: enabled
  • CONFIG_FAIR_GROUP_SCHED: enabled
  • CONFIG_RT_GROUP_SCHED: enabled
  • CONFIG_IP_VS: enabled
  • CONFIG_IP_VS_NFCT: enabled
  • CONFIG_IP_VS_RR: enabled
  • CONFIG_EXT4_FS: enabled
  • CONFIG_EXT4_FS_POSIX_ACL: enabled
  • CONFIG_EXT4_FS_SECURITY: enabled
  • Network Drivers:
  • "overlay":
  • CONFIG_VXLAN: enabled
    Optional (for encrypted networks):
  • CONFIG_CRYPTO: enabled
  • CONFIG_CRYPTO_AEAD: enabled
  • CONFIG_CRYPTO_GCM: enabled
  • CONFIG_CRYPTO_SEQIV: enabled
  • CONFIG_CRYPTO_GHASH: enabled
  • CONFIG_XFRM: enabled
  • CONFIG_XFRM_USER: enabled
  • CONFIG_XFRM_ALGO: enabled
  • CONFIG_INET_ESP: enabled
  • CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  • "ipvlan":
  • CONFIG_IPVLAN: enabled
  • "macvlan":
  • CONFIG_MACVLAN: enabled
  • CONFIG_DUMMY: enabled
  • "ftp,tftp client in container":
  • CONFIG_NF_NAT_FTP: enabled
  • CONFIG_NF_CONNTRACK_FTP: enabled
  • CONFIG_NF_NAT_TFTP: enabled
  • CONFIG_NF_CONNTRACK_TFTP: enabled
  • Storage Drivers:
  • "aufs":
  • CONFIG_AUFS_FS: missing
  • "btrfs":
  • CONFIG_BTRFS_FS: enabled
  • CONFIG_BTRFS_FS_POSIX_ACL: enabled
  • "devicemapper":
  • CONFIG_BLK_DEV_DM: enabled
  • CONFIG_DM_THIN_PROVISIONING: enabled
  • "overlay":
  • CONFIG_OVERLAY_FS: enabled
  • "zfs":
  • /dev/zfs: missing
  • zfs command: missing
  • zpool command: missing

Limits:

  • /proc/sys/kernel/keys/root_maxkeys: 1000000

nvidia@tegra-ubuntu:~/bin$ docker info
Containers: 4
Running: 1
Paused: 0
Stopped: 3
Images: 6
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-179:33-2899445-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: ext4
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 450.8 MB
Data Space Total: 107.4 GB
Data Space Available: 27.04 GB
Metadata Space Used: 1.192 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.146 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use --storage-opt dm.thinpooldev to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.110 (2015-10-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.38-jetsonbot-doc-v0.3
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 3.888 GiB
Name: tegra-ubuntu
ID: GGQY:ZUKT:AEXF:NPL2:JDFZ:UHGJ:7XVI:PXVI:FQK2:F6MZ:PP4A:PEVD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8


nvidia@tegra-ubuntu:~/docker/dq$ docker run --device=/dev/nvhost-ctrl --device=/dev/nvhost-ctrl-gpu --device=/dev/nvhost-prof-gpu --device=/dev/nvmap --device=/dev/nvhost-gpu --device=/dev/nvhost-as-gpu -v /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra device_query
/cudaSamples/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3981 MBytes (4174815232 bytes)
( 2) Multiprocessors, (128) CUDA Cores/MP: 256 CUDA Cores
GPU Max Clock rate: 998 MHz (1.00 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = NVIDIA Tegra X1
Result = PASS

check-config.sh.txt

Hi all,
I want to try nvidia-docker 2 in tx1. Is It possibile ti build libnvidia-docker for arm64 or there is some issue at the moment?

Thanks
Giuseppe

Any updates?

+1 for this feature.

What's the point of having the docker-friendly kernel in the jetpack 3.2 if there's no docker for the jetson tx2 running the jetpack? :(

kgdad commented

Regular docker will work fine as long as you don't need GPU access. If you do need access to the GPU in your containers you need to make sure you give your containers access to some specific libraries and devices. Can read the details here, GitHub-Tegra_Docker

Is this a feature on the current roadmap for nvidia-docker?

Although the above comment points to a valid workaround, it doesn't have all the nice features that nvidia-docker provides... 😞

flx42 commented

Sorry, it's still not on the roadmap, given than the driver stacks are very different today.

We use Tegra-Docker on TX2 as a workaround, but we really hope that nvidia-docker can add support for tx2 platform offically.

Is there any chance that this feature will appear on the roadmap given the recent release of the Jetson Xavier?

Hi, Is such a feature planned now for the newer devices? Any roadmaps ?

I am waiting too..!

I am waiting too

... also waiting!

Hi,

I have created a docker image on Jetson TX2 which contains Nvidia drivers, CUDA and Cudnn libraries. I am trying to give access of GPU and CUDA drivers to this image through tx2-docker script (https://github.com/Technica-Corporation/Tegra-Docker) but no success. I think tx2-docker is running successfully which you can see below:

wkh@tegra-ubuntu:~/Tegra-Docker/bin$ ./tx2-docker run openhorizon/aarch64-tx2-cudabase
Running an nvidia docker image
docker run -e LD_LIBRARY_PATH=:/usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu/tegra:/usr/local/cuda/lib64 --net=host -v /usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu -v /usr/local/cuda/lib64:/usr/local/cuda/lib64 --device=/dev/nvhost-ctrl --device=/dev/nvhost-ctrl-gpu --device=/dev/nvhost-prof-gpu --device=/dev/nvmap --device=/dev/nvhost-gpu --device=/dev/nvhost-as-gpu openhorizon/aarch64-tx2-cudabase

But when I try to run devicequery inside my container, it give me the result:

root@bc1130fc6be4:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery# ./deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

Any comment! Why this script is not giving access?

How's this feature coming along?

Looking for this for the Jetson Nano.

Could this run on non-nvidia platform? I mean if I use a arm device with GPU, if possible I can install a docker version with GPU support, then my application could make use of GPU on arm device?

This is the support matrix for the recently released 1.1.0 package on Tuesday 19-May:

+----------------------+-------------+----------------+---------+-----------------+
|  OS Name / Version   |  Identifier | amd64 / x86_64 | ppc64le | arm64 / aarch64 |
+======================+=============+================+=========+=================+
| Amazon Linux 1       | amzn1       |       X        |         |                 |
| Amazon Linux 2       | amzn2       |       X        |         |                 |
| Amazon Linux 2017.09 | amzn2017.09 |       X        |         |                 |
| Amazon Linux 2018.03 | amzn2018.03 |       X        |         |                 |
| Open Suse Leap 15.0  | sles15.0    |       X        |         |                 |
| Open Suse Leap 15.1  | sles15.1    |       X        |         |                 |
| Debian Linux 9       | debian9     |       X        |         |                 |
| Debian Linux 10      | debian10    |       X        |         |                 |
| Centos 7             | centos7     |       X        |    X    |                 |
| Centos 8             | centos8     |       X        |    X    |        X        |
| RHEL 7.4             | rhel7.4     |       X        |    X    |                 |
| RHEL 7.5             | rhel7.5     |       X        |    X    |                 |
| RHEL 7.6             | rhel7.6     |       X        |    X    |                 |
| RHEL 7.7             | rhel7.7     |       X        |    X    |                 |
| RHEL 8.0             | rhel8.0     |       X        |    X    |        X        |
| RHEL 8.1             | rhel8.1     |       X        |    X    |        X        |
| RHEL 8.2             | rhel8.2     |       X        |    X    |        X        |
| Ubuntu 16.04         | ubuntu16.04 |       X        |    X    |                 |
| Ubuntu 18.04         | ubuntu18.04 |       X        |    X    |        X        |
| Ubuntu 19.04         | ubuntu19.04 |       X        |    X    |        X        |
| Ubuntu 19.10         | ubuntu19.10 |       X        |    X    |        X        |
| Ubuntu 20.04         | ubuntu20.04 |       X        |    X    |        X        |
+----------------------+-------------+----------------+---------+-----------------+

Please let us know if this resolves your issue.

Listed as supported. Closed as resolved.