This document describes how to run Kube-OVN with OVS-DPDK.
- Kubernetes >= 1.11
- Docker >= 1.12.6
- OS: CentOS 7.5/7.6/7.7, Ubuntu 16.04/18.04
- 1GB Hugepages on the host
- On the host, modify the file /etc/default/grub
- Append the following to the setting GRUB_CMDLINE_LINUX:
default_hugepagesz=1GB hugepagesz=1G hugepages=X
Where X is the number of 1GB hugepages you wish to create on your system. Your usecases will determine the number of hugepages required and system memory available will determine the maximum possible. - Update Grub:
- On legacy boot systems run:
grub2-mkconfig -o /boot/grub2/grub.cfg
- On EFI boot systems run:
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
NOTE: This filepath is an example from a CentOS system, it will differ on other distros.
- On legacy boot systems run:
- Reboot the system
- To confirm hugepages configured run:
grep Huge /proc/meminfo
Example Output:
AnonHugePages: 2105344 kB
HugePages_Total: 32
HugePages_Free: 30
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Open vSwitch is highly configurable using other_config
options as described in Open vSwitch Manual.
All of those configs can be configured using simple config file /opt/ovs-config/config.cfg
. This file is being mounted in ovs-ovn
pod. It contains list of other_config
options. Each option should be placed in new line.
Example:
dpdk-socket-mem="1024,1024"
dpdk-init=true
pmd-cpu-mask=0x4
dpdk-lcore-mask=0x2
dpdk-hugepage-dir=/dev/hugepages
This example config will enable DPDK support with 1024MB of hugepages for both NUMA node 0 and NUMA node 1, PMD CPU mask 0x4, lcore mask 0x2 and hugepages in /dev/hugepages.
If file will not exist upon OVS initialization, the default configuration file will be created with values:
dpdk-socket-mem="1024"
dpdk-init=true
dpdk-hugepage-dir=/dev/hugepages
Note: Please remember, that if you would like to initialize Open vSwitch with more socket memory than 1024MB, you will have to reserve this memory for
ovs-ovn
pod by editing the valuehugepages-1G
ofovs-ovn
pod ininstall.sh
script. For example, to initialize Open vSwitch usingdpdk-socket-mem="1024,1024"
the minimal value will behugepages-1G: 2Gi
.
-
Download the installation script:
wget https://raw.githubusercontent.com/alauda/kube-ovn/release-1.4/dist/images/install.sh
-
Use vim to edit the script variables to meet your requirement
REGISTRY="index.alauda.cn/alaudak8s"
NAMESPACE="kube-system" # The ns to deploy kube-ovn
POD_CIDR="10.16.0.0/16" # Do NOT overlap with NODE/SVC/JOIN CIDR
SVC_CIDR="10.96.0.0/12" # Do NOT overlap with NODE/POD/JOIN CIDR
JOIN_CIDR="100.64.0.0/16" # Do NOT overlap with NODE/POD/SVC CIDR
LABEL="node-role.kubernetes.io/master" # The node label to deploy OVN DB
IFACE="" # The nic to support container network, if empty will use the nic that the default route use
VERSION="v1.1.0"
- Run the installation script making sure to include the flag --with-dpdk= followed by the required DPDK version.
bash install.sh --with-dpdk=19.11
Note: Current supported version is DPDK 19.11
The DPDK enabled vhost-user sockets provided by OVS-DPDK are not suitable for use as the default network of a Kubernetes pod. We must retain the OVS (kernel) interface provided by Kube-OVN and the DPDK socket(s) must be requested as additional interface(s).
To facilitate multiple network interfaces to a pod we can use the Multus-CNI plugin. To install Multus follow the Multus quick start guide. During installation, Multus should detect Kube-OVN has already been installed as the default Kubernetes network plugin and will automatically configure itself so Kube-OVN continues to be the default network plugin for all pods.
Note: Multus determines the existing default network as the lexicographically (alphabetically) first configuration file in the /etc/cni/net.d directory. If another plugin has the lexicographically first config file at this location, it will be considered the default network. Rename configuration files accordingly before Multus installation.
With Multus installed, additional Network interfaces can now be requested within a pod spec.
There is now a containerized instance of OVS-DPDK running on the node. Kube-OVN can provide all of its regular (kernal) functionality. Multus is in place to enable pods request the additional OVS-DPDK interfaces. However, OVS-DPDK does provide regular Netdev interfaces, but vhost-user sockets. These sockets cannot be attached to a pod in the usual manner where the Netdev is moved to the pod network namespace. These sockets must be mounted into the pod. Kube-OVN (at least currently) does not have this socket-mounting ability. For this functionality we can use the Userspace CNI Network Plugin.
Note: These steps assume Go has already been installed, and the GOPATH env var has been set.
go get github.com/intel/userspace-cni-network-plugin
cd $GOPATH/src/github.com/intel/userspace-cni-network-plugin
make clean
make install
make
cp userspace/userspace /opt/cni/bin
A NetworkAttachmentDefinition is used to represent the network attachments. In this case we need a NAD to represent the network interfaces provided by Userspace CNI, i.e. the OVS-DPDK interfaces. It will then be possible to request this network attachment within a pod spec and Multus will attach these to the pod as secondary interfaces in addition to the preconfigured default network, i.e. the Kube-OVN provided OVS (Kernel) interfaces.
Create the NetworkAttachmentDefinition
cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ovs-dpdk-br0
spec:
config: |
{
"cniVersion": "0.3.1",
"type": "userspace",
"name": "ovs-dpdk-br0",
"kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
"logFile": "/var/log/userspace-ovs-dpdk-br0.log",
"logLevel": "debug",
"host": {
"engine": "ovs-dpdk",
"iftype": "vhostuser",
"netType": "bridge",
"vhost": {
"mode": "server"
},
"bridge": {
"bridgeName": "br0"
}
},
"container": {
"engine": "ovs-dpdk",
"iftype": "vhostuser",
"netType": "interface",
"vhost": {
"mode": "client"
}
}
}
EOF
It should now be possible to request the Userspace CNI provided interfaces as annotations within a pod spec. The example below will request two OVS-DPDK interfaces, these will be in addition to the default network.
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: ovs-dpdk-br0, ovs-dpdk-br0
Userspace-CNI is intended to run in an environment where OVS-DPDK is installed directly on the host, rather than in a container. Userspace-CNI makes calls to OVS-DPDK using an application called ovs-vsctl. With a containerized OVS-DPDK, this application is no longer available on the host. The following is a workaround to take ovs-vsctl calls made from the host and direct them to the appropriate Kube-OVN container running OVS-DPDK.
cat <<'EOF' > /usr/local/bin/ovs-vsctl
#!/bin/bash
ovsCont=$(docker ps | grep kube-ovn | grep ovs-ovn | grep -v pause | awk '{print $1}')
docker exec $ovsCont ovs-vsctl $@
EOF
chmod +x /usr/local/bin/ovs-vsctl
CPU masking is not necessary, but some advanced users may wish to use this feature in OVS-DPDK. When starting OVS-DPDK ovs-vsctl has the ability to configure a CPU mask. This should be used with something like CPU-Manager-for-Kubernetes. Configuration of such a setup is complex and specific to each system. It is out of the scope of this document. Please consult OVS-DPDK and CMK documentation.
A sample Kubernetes pod running a DPDK enabled Docker image.
Create the Dockerfile, name it Dockerfile.dpdk
FROM centos:8
ENV DPDK_VERSION=19.11.1
ENV DPDK_TARGET=x86_64-native-linuxapp-gcc
ENV DPDK_DIR=/usr/src/dpdk-stable-${DPDK_VERSION}
RUN dnf groupinstall -y 'Development Tools'
RUN dnf install -y wget numactl-devel
RUN cd /usr/src/ && \
wget http://fast.dpdk.org/rel/dpdk-${DPDK_VERSION}.tar.xz && \
tar xf dpdk-${DPDK_VERSION}.tar.xz && \
rm -f dpdk-${DPDK_VERSION}.tar.xz && \
cd ${DPDK_DIR} && \
sed -i s/CONFIG_RTE_EAL_IGB_UIO=y/CONFIG_RTE_EAL_IGB_UIO=n/ config/common_linux && \
sed -i s/CONFIG_RTE_LIBRTE_KNI=y/CONFIG_RTE_LIBRTE_KNI=n/ config/common_linux && \
sed -i s/CONFIG_RTE_KNI_KMOD=y/CONFIG_RTE_KNI_KMOD=n/ config/common_linux && \
make install T=${DPDK_TARGET} DESTDIR=install
Build the Docker image and tag it as dpdk:19.11. This build will take some time.
docker build -t dpdk:19.11 -f Dockerfile.dpdk .
Create the Pod Spec, name it pod.yaml
apiVersion: v1
kind: Pod
metadata:
generateName: testpmd-dpdk-
annotations:
k8s.v1.cni.cncf.io/networks: ovs-dpdk-br0, ovs-dpdk-br0
spec:
containers:
- name: testpmd-dpdk
image: dpdk:19.11
imagePullPolicy: Never
securityContext:
privileged: true
command: ["tail", "-f", "/dev/null"]
resources:
requests:
hugepages-1Gi: 2Gi
memory: 2Gi
limits:
hugepages-1Gi: 2Gi
memory: 2Gi
volumeMounts:
- mountPath: /hugepages
name: hugepages
- mountPath: /vhu
name: vhu
volumes:
- name: vhu
hostPath:
path: /var/run/openvswitch/
- name: hugepages
emptyDir:
medium: HugePages
restartPolicy: Never
Run the pod.
kubectl create -f pod.yaml
The pod will be created with a kernel OVS interface provided by Kube-OVN, as the default network. In addition two secondary interfaces will be available within the pod as socket files located under /vhu/
.
The above pod spec will mount the directory /var/run/openvswitch/
into the pod. This is the default location where OVS-DPDK creates it's socket files, meaning with this configuration all socket files are visible to all pods. It may be desirable to ensure that only the socket files created for a pod are visible within that pod. Userspace-CNI provides the option of mounting a unique directory containing only the relevant socket files.
The pod spec needs to be updated as shown below. The name of the volumeMount needs to be shared-dir
and the hostPath needs to be updated to include the unique directory for this pod. In this case we call the unique directory pod1/
. When this pod is created, a new directory pod1/
will be created under /var/run/openvswitch/
. Userspace-CNI will then place only the relevant socket files in this directory and this directory is then mounted into the pod where it will appear as /vhu/
.
apiVersion: v1
kind: Pod
metadata:
generateName: testpmd-dpdk-
annotations:
k8s.v1.cni.cncf.io/networks: ovs-dpdk-br0, ovs-dpdk-br0
spec:
containers:
- name: testpmd-dpdk
image: dpdk:19.11
imagePullPolicy: Never
securityContext:
privileged: true
command: ["tail", "-f", "/dev/null"]
resources:
requests:
hugepages-1Gi: 2Gi
memory: 2Gi
limits:
hugepages-1Gi: 2Gi
memory: 2Gi
volumeMounts:
- mountPath: /hugepages
name: hugepages
- mountPath: /vhu
name: shared-dir
volumes:
- name: shared-dir
hostPath:
path: /var/run/openvswitch/pod1/
- name: hugepages
emptyDir:
medium: HugePages
restartPolicy: Never
Finally, we need to tell Userspace-CNI where it can find the newly generated socket files, as this default location can be configured and changed. For a Kube-OVN install, this location will be /var/run/openvswitch/
. This location is provided to Userspace-CNI as an environment variable. Set this environment variable and restart Kubelet:
echo "OVS_SOCKDIR=\"/var/run/openvswitch/\"" >> /var/lib/kubelet/kubeadm-flags.env
systemctl daemon-reload && systemctl restart kubelet
To run TestPMD:
testpmd -m 1024 -c 0xC --file-prefix=testpmd_ --vdev=net_virtio_user0,path=<path-to-socket-file1> --vdev=net_virtio_user1,path=<path-to-socket-file2> --no-pci -- --no-lsc-interrupt --auto-start --tx-first --stats-period 1