Unable to create bridge on same subnet of VF
psaini79 opened this issue · 13 comments
I tried to create to the bridge using sriov plugin but it kept on failing on ROCE CX5 card. I tried to use the same subnet which is available on host devices i.e. 192.168.10.0/24 but I get following error:
docker network create -d sriov --subnet=192.168.10.0/24 -o netdevice=re6 -o mode=passthrough mynet1 Error response from daemon: Pool overlaps with other one on this address space
I am able to create the bridge if I use different subnet but for my usecase bridge must be on same subnet so that I can reach other nodes running on same subnet. Also, I want to know if rdma stack can work inside the container? will rds-ping will work inside the container?
- mode should be sriov. mode=sriov.
- rds-pring won't work, as its unsupported currently.
other application such as rping should work depending on which kernel version you are running. kernel 4.19/4.20 is required for RoCE. - This plugin doesn't create any bridge.
- You can have your PF netdevice in same subnet as that of container VFs.
Here is a sample example,
ifconfig ens1f0
ens1f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 194.168.1.36 netmask 255.255.255.0 broadcast 194.168.1.255
inet6 fe80::268a:7ff:fe55:4660 prefixlen 64 scopeid 0x20
ether 24:8a:07:55:46:60 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8 bytes 648 (648.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ docker network create -d sriov --subnet=194.168.1.0/24 -o netdevice=ens1f0 mynet
This will allow you to reach to other nodes in same subnet on other systems or VF to PF communication too.
Thanks for the quick reply. I tried to create the network using SRIOV on same subnet but it didn't work.
ifconfig re1 re1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:1f txqueuelen 1000 (Ethernet) RX packets 24737 bytes 1706634 (1.6 MiB) RX errors 1692 dropped 0 overruns 0 frame 1692 TX packets 6911 bytes 465960 (455.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker network create -d sriov --subnet=192.168.10.0/24 -o netdevice=re1 mynet Error response from daemon: Pool overlaps with other one on this address space
Kernel version on host:
uname -a Linux scaqaj01adm02.us.oracle.com 4.14.35-1902.0.12.el7uek.x86_64 #2 SMP Sat Mar 23 10:27:18 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux
Kernel version on Container
uname -a Linux racnode6 4.14.35-1902.0.12.el7uek.x86_64 #2 SMP Sat Mar 23 10:27:18 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux
So rdma stack is supported from kernel version 4.19 inside the container? I am asking this question because when I create a macvlan bridge on ROCE interface and assign to container , I am able to ping all other IPs but /proc/sys/net/rds
doesn't appear inside the container so rds-ping fails. I understand rds-ping is not supported but rping can work and rdma can be supported for ROCE from 4.19?
lspci output
`
lspci | grep Mel
54:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
54:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
54:03.2 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:03.3 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:03.4 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:03.5 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:03.6 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:03.7 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:04.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:04.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:04.2 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:04.3 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:04.4 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:04.5 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:04.6 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:04.7 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:05.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:05.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:05.2 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:05.3 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:05.4 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:05.5 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:05.6 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:05.7 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:06.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
54:06.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
74:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
74:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
94:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
94:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
b4:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
b4:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
`
ifconfig output
`
ifconfig
bondeth0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500
inet 172.16.1.15 netmask 255.255.255.0 broadcast 10.31.213.255
ether b0:26:28:2f:42:00 txqueuelen 1000 (Ethernet)
RX packets 1941744 bytes 2171090611 (2.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 497208 bytes 59359447 (56.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:7f:c0:04:da txqueuelen 0 (Ethernet)
RX packets 129235 bytes 6897362 (6.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 302056 bytes 505027213 (481.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 297267 bytes 30985391 (29.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 297267 bytes 30985391 (29.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo:1: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.2 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
lo:2: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.3 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
lo:3: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.4 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
re0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300
inet 192.168.10.9 netmask 255.255.255.0 broadcast 192.168.10.255
ether 50:6b:4b:df:17:1e txqueuelen 1000 (Ethernet)
RX packets 30956 bytes 2173450 (2.0 MiB)
RX errors 514 dropped 0 overruns 0 frame 514
TX packets 12153 bytes 874026 (853.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
re1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300
inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255
ether 50:6b:4b:df:17:1f txqueuelen 1000 (Ethernet)
RX packets 24762 bytes 1708514 (1.6 MiB)
RX errors 1692 dropped 0 overruns 0 frame 1692
TX packets 6936 bytes 467840 (456.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
re2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300
inet 192.168.10.11 netmask 255.255.255.0 broadcast 192.168.10.255
ether 50:6b:4b:df:17:26 txqueuelen 1000 (Ethernet)
RX packets 28286 bytes 1919878 (1.8 MiB)
RX errors 514 dropped 0 overruns 0 frame 514
TX packets 9786 bytes 637700 (622.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
re3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300
inet 192.168.10.12 netmask 255.255.255.0 broadcast 192.168.10.255
ether 50:6b:4b:df:17:27 txqueuelen 1000 (Ethernet)
RX packets 23159 bytes 1589838 (1.5 MiB)
RX errors 1690 dropped 0 overruns 0 frame 1690
TX packets 5559 bytes 362762 (354.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
re4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300
inet 192.168.10.13 netmask 255.255.255.0 broadcast 192.168.10.255
ether 50:6b:4b:df:17:2e txqueuelen 1000 (Ethernet)
RX packets 39767 bytes 2586470 (2.4 MiB)
RX errors 514 dropped 0 overruns 0 frame 514
TX packets 61958 bytes 5282890 (5.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
re5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300
inet 192.168.10.14 netmask 255.255.255.0 broadcast 192.168.10.255
ether 50:6b:4b:df:17:2f txqueuelen 1000 (Ethernet)
RX packets 30595 bytes 2169872 (2.0 MiB)
RX errors 1690 dropped 0 overruns 0 frame 1690
TX packets 11031 bytes 738886 (721.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
re6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300
inet 192.168.10.15 netmask 255.255.255.0 broadcast 192.168.10.255
ether 50:6b:4b:df:17:16 txqueuelen 1000 (Ethernet)
RX packets 26861 bytes 1834644 (1.7 MiB)
RX errors 514 dropped 0 overruns 0 frame 514
TX packets 9953 bytes 652546 (637.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
re7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300
inet 192.168.10.16 netmask 255.255.255.0 broadcast 192.168.10.255
ether 50:6b:4b:df:17:17 txqueuelen 1000 (Ethernet)
RX packets 29980 bytes 2132858 (2.0 MiB)
RX errors 1690 dropped 0 overruns 0 frame 1690
TX packets 11153 bytes 749550 (731.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:e6:d3:af txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
`
@psaini79 @psaini2018
I just checked patches, kernel 4.20 or higher is required.
yes, rping and rdma should work.
You should follow this post.
Not too much different than what you are doing. Just for the reference.
https://community.mellanox.com/s/article/docker-rdma-sriov-networking-with-connectx4-connectx5
for duplicate subnet IP, please share the docker version and sriov plugin logs.
I suspect you have these issue because you have multiple netdevices in same subnet.
And this is not an issue with the sriov plugin. It seems to be a failure from the docker.
Thanks, you are right. I am able to create network on same subnet and able to ping the target from container.
`
[root@b1936cddf1d3 mofed_installer]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.10.50 netmask 255.255.255.0 broadcast 192.168.10.255
ether 8e:d4:4b:dc:dd:dd txqueuelen 1000 (Ethernet)
RX packets 5 bytes 376 (376.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5 bytes 376 (376.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
`
ping command to target works
`
[root@b1936cddf1d3 mofed_installer]# ping 192.168.10.17
PING 192.168.10.17 (192.168.10.17) 56(84) bytes of data.
64 bytes from 192.168.10.17: icmp_seq=1 ttl=64 time=0.120 ms
64 bytes from 192.168.10.17: icmp_seq=2 ttl=64 time=0.078 ms
`
However, rping command is failing. I executed following command on server i.e. on 192.168.10.17:
rping -s -C -a 192.168.10.17 -v
Executed following command inside the container:
`
[root@b1936cddf1d3 mofed_installer]# rping -c -a 192.168.10.17 -v
cma event RDMA_CM_EVENT_ADDR_ERROR, error -19
waiting for addr/route resolution state 1
`
I created the container using following command:
docker_rdma_sriov run --net=mynet --ip=192.168.10.50 -it mellanox/mlnx_ofed_linux-4.4-1.0.0.0-centos7.4 bash
@psaini79
Can you please share the output when running rping -d ...
We also likely need to see kernel ftraces if it doesn't work.
We haven't tried mofed user space and upstream kernel.
Usually with upstream kernel, upstream rdma-core (any version) should be used.
With mofed kernel, mofed user space should be used.
So post this, you might want to create rdma-core based container image.
Please also share the docker run command that you run.
Did you follow the post I shared previously, listed below?
https://community.mellanox.com/s/article/docker-rdma-sriov-networking-with-connectx4-connectx5
Yes, I followed the steps given in https://community.mellanox.com/s/article/docker-rdma-sriov-networking-with-connectx4-connectx5. However, I executed steps from 5th step onward. Also, following commands exit the container without any error:
docker run --net=host -v /usr/bin:/tmp rdma/container_tools_installer
Please find the output of rping -d below:
`
[root@b1936cddf1d3 mofed_installer]# rping -d -c -a 192.168.10.17 -v
client
verbose
created cm_id 0x97f1c0
cma_event type RDMA_CM_EVENT_ADDR_ERROR cma_id 0x97f1c0 (parent)
cma event RDMA_CM_EVENT_ADDR_ERROR, error -19
waiting for addr/route resolution state 1
destroy cm_id 0x97f1c0
`
I have one more question, any ETA for rds_ping to work inside the container?
@psaini2018
please share output of
$ ibdev2netdev in container
$ show_gids in container
$ uname -a on host.
What is the command you ran to run this container?
Please talk to Mellanox support for rds-ping.
The above command failed inside the container:
[root@9098b076cc9a mofed_installer]# ibdev2netdev
bash: ibdev2netdev: command not found
[root@9098b076cc9a mofed_installer]# show_gids
bash: show_gids: command not found
uname -a
Linux rdma-setup 4.14.35-1902.0.12.el7uek.x86_64 #2 SMP Sat Mar 23 10:27:18 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux
@psaini2018 so as we discussed yesterday in this thread that you need kernel 4.20. Please upgrade to it.
I need to upgrade the host kernel to 4.20? Just want to make sure to avoid any rework. Also, for rds_ping SR need to be opened through MLX support login? or is there any github repo for that to open the issue?
@psaini2018, yes 4.20 or higher. 5.1 is even better. :-)
rds-ping is owned by Oracle, you should first resolve at Oracle on supporting rds-ping before opening Mellanox support case.
Ok and thanks a lot for your quick reply.
As per the following link Ethernet card inside the container made available using IPoIB.
https://community.mellanox.com/s/article/docker-rdma-sriov-networking-with-connectx4-connectx5
I have a question, what is the difference between IPoIB device and VM IPoIB device? are they technically same?
@psaini2018
yes.
Can you please close this issue?
If you like the plugin you can also star it. :-)