The purpose is to troubleshoot SR-IOV interfaces which are bound to the vfio-pci driver and to forward packets from user space to the pod's kernel space on interface dp0
.
This repository should rather be named dpdk-testpmd. The testpmd container allows running a privileged pod which will run testpmd. The testpmd will have the pod's single VFIO-PCI interface connected on one end, and a tunnel interface to the pod's kernel on the other end. All packets are forwarded with testpmd between the vfio-pci interface and the pod's kernel space.
Make sure to check all prerequisites, first. See the Prerequisites
section for more info.
Deploy the net-attach-def in the application namespace:
# oc new-project sriov-testing
# cat <<'EOF' > sriovnetwork-enp5s0f0.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: sriov-net-enp5s0f0-vfiopci
namespace: openshift-sriov-network-operator
spec:
networkNamespace: sriov-testing
ipam: '{ "type": "static" }'
vlan: 109
resourceName: enp5s0f0Vfiopci
trust: "on"
capabilities: '{ "mac": true, "ips": true }'
EOF
# oc apply -f sriovnetwork-enp5s0f0.yaml
Now, adjust the testpmd.yaml
file according to your settings:
apiVersion: v1
kind: Pod
metadata:
name: testpmd
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "sriov-net-enp5s0f0-vfiopci",
"mac": "20:04:0f:f1:88:01",
"ips": ["192.168.109.10/24"]
}
]'
spec:
containers:
- name: testpmd
image: quay.io/akaris/dpdk-kni:<current tag>
env:
- name: PCI_DEVICE_FILTER
value: "PCIDEVICE_OPENSHIFT_IO"
- name: IP_ADDRESS
value: "192.168.109.10/24"
imagePullPolicy: IfNotPresent
command: ["/bin/bash", "-c", "/entrypoint.sh"]
resources:
limits:
hugepages-1Gi: "1Gi"
memory: "2Gi"
cpu: "4000m"
requests:
hugepages-1Gi: "1Gi"
memory: "2Gi"
cpu: "4000m"
securityContext:
privileged: true
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
volumes:
- name: hugepage
emptyDir:
medium: HugePages
The PCI_DEVICE_FILTER
should be o.k. as is for OpenShift deployments. It will run env | grep PCIDEVICE_OPENSHIFT_IO
and take the first value that it can find here as the PCI address of the SR-IOV interface.
The IP_ADDRESS
must be adjusted and must contain the IP address and subnet mask that the vfio-pci interface (or rather it's kernel tunnel endpoint) should have. The IP will be statically added.
Adjust the k8s.v1.cni.cncf.io/networks:
annotation according to your needs and environment.
Adjust the imagePullPolicy: IfNotPresent
if needed.
After adjusting the file, apply it:
# oc apply -f testpmd.yaml
First of all, deploy the SR-IOV operator and the Performance Addon Operator. The following steps are for OpenShift 4.6 and will vary depending on the OpenShift version. You will need working SR-IOV, CPU isolation and memory hugepages.
Install the Performance Addon Operator (https://docs.openshift.com/container-platform/4.6/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.html):
# cat <<'EOF' > pao-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: openshift-performance-addon-operator
EOF
# oc apply -f pao-namespace.yaml
# cat <<'EOF' > pao-operatorgroup.yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: openshift-performance-addon-operator
namespace: openshift-performance-addon-operator
spec:
targetNamespaces:
- openshift-performance-addon-operator
EOF
# oc apply -f pao-operatorgroup.yaml
# cat <<'EOF' > pao-sub.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: openshift-performance-addon-operator-subscription
namespace: openshift-performance-addon-operator
spec:
channel: "4.6"
name: performance-addon-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
# oc create -f pao-sub.yaml
Once the Performance Addon Operator was successfully installed, go ahead and create the PerformanceProfile and wait until it's deployed correctly. The nodes should have 1G hugepages and CPU pinning enabled after this:
# oc label mcp worker machineconfiguration.openshift.io/role=worker
# cat <<'EOF' > pp.yaml
apiVersion: performance.openshift.io/v1
kind: PerformanceProfile
metadata:
name: performance
spec:
cpu:
isolated: "8-39"
reserved: "0-7"
hugepages:
defaultHugepagesSize: "1G"
pages:
- size: "1G"
count: 16
node: 0
- size: "1G"
count: 16
node: 1
nodeSelector:
node-role.kubernetes.io/worker: ""
EOF
# oc apply -f pp.yaml
Make sure to set up SR-IOV operator (https://docs.openshift.com/container-platform/4.6/networking/hardware_networks/installing-sriov-operator.html):
# cat <<'EOF' > sriov-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: openshift-sriov-network-operator
labels:
openshift.io/run-level: "1"
EOF
# oc apply -f sriov-namespace.yaml
# cat <<'EOF' > sriov-operatorgroup.yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: sriov-network-operators
namespace: openshift-sriov-network-operator
spec:
targetNamespaces:
- openshift-sriov-network-operator
EOF
# oc apply -f sriov-operatorgroup.yaml
# cat <<'EOF' > sriov-sub.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: sriov-network-operator-subscription
namespace: openshift-sriov-network-operator
spec:
channel: "4.6"
name: sriov-network-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
# oc apply -f sriov-sub.yaml
After installation of the SR-IOV operator, deploy the SriovNetworkNodePolicy:
# cat <<'EOF' > ./networkpolicy-vfiopci.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov-enp5s0f0-vfiopci
namespace: openshift-sriov-network-operator
spec:
resourceName: enp5s0f0Vfiopci
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
priority: 10
mtu: 1500
numVfs: 5
nicSelector:
pfNames: ["enp5s0f0"]
deviceType: "vfio-pci"
isRdma: false
EOF
# oc apply -f ./networkpolicy-vfiopci.yaml
Any tagged version tags/v.*
will be built by quay.io.
The testpmd container can be built with:
make build-testpmd
Dpdk binaries can be built with:
make build-dpdk
After DPDK binaries are built, the following entrypoint script is responsible for all of the other magic.
The script will run a function in the background which will wait for interface dp0
to show up. When it does, it will attach $IP_ADDRESS
to interface dp0
.
The script will run dpdk-ethtool
to retrieve the SR-IOV interface's MAC address. It will then run dpdk-testpmd
.
The script will run in verbose mode -x
so check the script logs if anything goes wrong.
Also, the script will not exit, instead it will sleep forever if something goes wrong. That way, you can oc rsh
to the pod and try running the commands yourself and play with the values.
#!/bin/bash -x
# https://doc.dpdk.org/guides/linux_gsg/linux_eal_parameters.html
# https://doc.dpdk.org/guides/sample_app_ug/kernel_nic_interface.html
function addip {
if [ "$IP_ADDRESS" != "" ];then
while true; do
echo "Adding IP address to dp0"
if ip address add dev dp0 $IP_ADDRESS; then
break
fi
echo "Failed adding IP address to dp0, sleeping for 5 seconds"
sleep 5
done
fi
}
echo "Get PCI_DEVICE_ID from filter expression"
PCI_DEVICE_FILTER=${PCI_DEVICE_FILTER:-PCIDEVICE_OPENSHIFT_IO}
PCI_DEVICE_ID=$(env | grep $PCI_DEVICE_FILTER | awk -F '=' '{print $NF}' | head -1)
echo "Get MAC ADDRESS"
MACADDR=$(echo "macaddr 0" | /dpdk-ethtool -a $PCI_DEVICE_ID | awk '/Port 0 MAC Address/ {print $NF}')
echo "Get pinned CPUs"
CPUS=$(cat /proc/self/status | awk '/Cpus_allowed_list/ {print $NF}')
echo "Run addip script in background"
addip &
echo "Run testpmd and forward everything between dp0 tunnel interface and vfio interface"
( echo 'start' ; while true ; do echo 'show port stats all' ; sleep 60 ; done ) | /dpdk-testpmd --log-level=10 --legacy-mem --vdev=net_tap1,iface=dp0,mac=${MACADDR} -l $CPUS -n 4 -a $PCI_DEVICE_ID -- --nb-cores=1 --nb-ports=2 --total-num-mbufs=2048 -i
echo "FAILURE! If we got here, this means that it's time for troubleshooting. Testpmd did not run or crashed!"
sleep infinity