Using SR-IOV Interfaces in OpenShift Virtualization
Author: Brandon B. Jozsa
"Nothing is impossible, the word itself says 'I'm Possible'."
– Audrey Hepburn
- Part I: Introduction
- Part II: Verify Supported Networking Hardware
- Part III: Installing the SR-IOV OpenShift Operator
- Part IV: Creating SR-IOV Policies and Networks
- SR-IOV Network Node Policies
- SR-IOV Networks
- Part V: Creating Virtual Machines
- Part VI: Troubleshooting Tips
- Method: jq
- Method: jsonpath
Part I: Introduction
Using SR-IOV interfaces in OpenShift is really quite simple. For advanced use-cases and implementation details, have a look at OpenShift's documentation (4.12 at the time of this writing). What I will cover in this article is how to implement SR-IOV on a SNO device, like in the case of a home lab or development environment.
Part II: Verify Supported Networking Hardware
The very first thing you need to do is verify your networking hardware. In general, you need to check a few things:
- Check the list of OpenShift Container Platform SR-IOV supported NICs documentation (at the time of writing 4.12)
- Check for a list of supported devices/drivers for the VNF you wish to run on OpenShift. Palo Alto VM series and Fortinet VM series both have great support documentation, which includes version and driver support.
- Ancillary equipment such as switches and cables
- Architecture and design
Finally, as a small note to "home lab" or other folks who may be newer to networking: plan very carefully. Switches and cabling should be part of your overall budget planning, and these ancillary considerations can really add up. On a positive note, networking prices are starting to drop because of CPU/architecture changes in the market. You can find a Mellonox ConnectX-4 for a relatively decent price, but make sure that you have the correct switch, modules, DACs, etc. If you have any questions, feel free to reach me on Twitter for help.
Part III: Installing the SR-IOV OpenShift Operator
Now it's time to deploy the OpenShift Operator. This can be installed the same way that other OpenShift Operators are installed: using an OperatorGroup
and a corresponding Subscription
.
- First, create a namespace called
openshift-sriov-network-operator
. You can copy and paste the following to have this created for you:cat << EOF| oc create -f - apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator annotations: workload.openshift.io/allowed: management EOF
- Next, create an
OperatorGroup
CR, based on the following YAML. Against you can copy and paste this "as-is":cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator EOF
- In order to create the subscription, you need the cluster version (major). This can be mapped to a variable for step 4 with the following command:
OC_VERSION=$(oc version -o yaml | grep openshiftVersion | \ grep -o '[0-9]*[.][0-9]*' | head -1)
- Lastly, create the subscription for the SR-IOV Operator with the following command:
NOTE: Verify the installation progression with the following command:cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subscription namespace: openshift-sriov-network-operator spec: channel: "${OC_VERSION}" name: sriov-network-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
oc get csv -n openshift-sriov-network-operator \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
Part IV: Creating SR-IOV Policies and Networks
With the SR-IOV Operator successfully deployed, you will need to create two objects that work together before they can be leveraged as VFs (virtual functions) for your virtual machines (VNF): an SriovNetworkNodePolicy
and an SriovNetwork
.
SR-IOV Network Node Policies
Think of the SR-IOV Network Node Policy as a declaritive method for configuring the NIC for a given node. In order to properly configure this device, you need to know some things about your NIC and where the NIC is installed for your server.
For example, let's say that I have an Intel XXV710-DA2 card installed in the 4th PCIe slot on a Dell T640 server. The first thing that I do is either log into the iDRAC (or iLO, if you're using an HP system), or search within the Dell PowerEdge Technical Guide for the T640 Series Servers. In section 7, under "Networking and PCIe" there is a "PCIe expansion slots" subheading. This tells me that PCIe slot 4 is a Full Length slot which supports x8 PCIe lanes/ports on CPU 2. This means I must have 2 CPUs installed, and considering that the XXV710 card requires x8 lanes according to ark.intel.com, I'm in good shape.
Next, we need find the vendor
and deviceID
, so that we can add these items to our SriovNetworkNodePolicy
manifest. The easiest way to find this is to use the following command:
next important thing to know is the interface you want to use for VFs. This is as simple as using the following (which we'll come back to in a moment):
[core@itamae ~]$ lspci -nn | egrep Ethernet | egrep 710
89:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 [8086:158b] (rev 02)
89:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 [8086:158b] (rev 02)
[core@itamae ~]$
The vendor
is 8086
(for Intel), and the card deviceID
is 158b
.
The next important thing to know is the interface you want to use for VFs. This is simple, as our interface name is ens4f0
. But we may want to use some VFs for one namespace/project, and use another group of VFs for another namespace. This isn't a problem, as we're going to assign half of our VFs to the project/namespace jinkit-vms
.
So here's our manifest:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-sriov-710-ens4f0-0031
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
isRdma: false
nicSelector:
deviceID: 158b
pfNames:
- 'ens4f0#0-31'
vendor: '8086'
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: 'true'
numVfs: 64
priority: 97
resourceName: 710_ens4f0_0031
The best way I can explain each of these fields is to show you the best oc
or kubectl
tool that you're probably not using:
❯ oc explain SriovNetworkNodePolicy.spec
KIND: SriovNetworkNodePolicy
VERSION: sriovnetwork.openshift.io/v1
RESOURCE: spec <Object>
DESCRIPTION:
SriovNetworkNodePolicySpec defines the desired state of
SriovNetworkNodePolicy
FIELDS:
deviceType <string>
The driver type for configured VFs. Allowed value "netdevice", "vfio-pci".
Defaults to netdevice.
eSwitchMode <string>
NIC Device Mode. Allowed value "legacy","switchdev".
isRdma <boolean>
RDMA mode. Defaults to false.
linkType <string>
NIC Link Type. Allowed value "eth", "ETH", "ib", and "IB".
mtu <integer>
MTU of VF
needVhostNet <boolean>
mount vhost-net device. Defaults to false.
nicSelector <Object> -required-
NicSelector selects the NICs to be configured
nodeSelector <map[string]string> -required-
NodeSelector selects the nodes to be configured
numVfs <integer> -required-
Number of VFs for each PF
priority <integer>
Priority of the policy, higher priority policies can override lower ones.
resourceName <string> -required-
SRIOV Network device plugin endpoint resource name
Now let's talk about SR-IOV Networks, because these are going to be namespaced objects which can be used in multi-tenant environments.
SR-IOV Networks
Put simply, an SriovNetwork
is a namespaced object (so think multi-tenancy) that maps to a Network Attachment Definition, which in turn can be used for a virtual machine. This needs to be bound to the resourceName
field on the SriovNetworkNodePolicy
we created earlier - so in our example above, this is 710_ens4f0_0031
. This also means that administrators for the jinkit-vms
namespace have VFs 0-31
available (per the pfNames
ens4f0#0-31
).
Another thing that the SriovNetwork
does is configure the IPAM type for the Network Attachment Definition. In the example below, we're saying that there's a DHCP server on VLAN 55:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: 710-ens4f0-vlan55-dhcp
namespace: openshift-sriov-network-operator
spec:
ipam: |-
{
"ipam": {
"type": "dhcp"
}
}
networkNamespace: jinkit-vms
resourceName: 710_ens4f0_0031
vlan: 55
Part V: Creating Virtual Machines
Now we're on to the most important part, right? We need to create a virtual machine, and ensure that this virtual machine can claim a VF that's been prepared by the SriovNetworkNodePolicy
and SriovNetwork
.
The important thing to remeber is that the virtual machine will have two references for networking. The first is going to be under the two manifests paths:
virtualmachine.spec.template.spec.networks
virtualmachine.spec.template.spec.domain.devices.interfaces
The mapping looks like the following:
spec:
template:
spec:
domain:
devices:
interfaces:
- macAddress: '02:9b:48:00:10:03'
model: virtio
name: fortigate-test-02-port1
sriov: {}
- macAddress: '02:9b:48:00:10:04'
model: virtio
name: fortigate-test-02-port2
sriov: {}
- macAddress: '02:9b:48:00:10:05'
model: virtio
name: fortigate-test-02-port3
sriov: {}
- macAddress: '02:9b:48:00:10:44'
model: virtio
name: fortigate-test-02-port4
sriov: {}
networks:
- multus:
networkName: 710-ens4f0-vlan90-dhcp
name: fortigate-test-02-port1
- multus:
networkName: 710-ens4f0-vlan50-dhcp
name: fortigate-test-02-port2
- multus:
networkName: 710-ens4f1-vlan40-dhcp
name: fortigate-test-02-port3
- multus:
networkName: 710-ens4f1-vlan20-dhcp
name: fortigate-test-02-port4
As you can see, the manifest above is for VFs that are generated for a Fortigate VM series NGFW VNF.
Part VI: Troubleshooting Tips
Let's say that you're working with a cluster that has several, or even dozens of virtual machines, but you want to look for only the associated SR-IOV VFs. I can at least help you get started with a command.
Method: jq
If you're using the popular tool jq
, then you can use the following method of searching for named VFs that are deployed in your cluster (look closely at the path):
❯ oc get vms -A -o json | jq '.items[].spec.template.spec.networks[].multus.networkName'
"jinkit-vms/710-ens4f1-vlan4-dhcp"
"710-ens4f1-vlan20-dhcp"
"710-ens4f1-vlan20-dhcp"
"710-ens4f1-vlan25-dhcp"
"710-ens4f0-vlan55-dhcp"
"jinkit-vms/810-ens6f1-vlan4-dhcp"
"jinkit-vms/810-ens6f1-vlan4-dhcp"
"710-ens4f1-vlan20-dhcp"
"jinkit-vms/810-ens6f1-vlan4-dhcp"
null
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan50-dhcp"
"710-ens4f1-vlan40-dhcp"
"710-ens4f1-vlan20-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan55-dhcp"
"710-ens4f1-vlan45-dhcp"
"710-ens4f1-vlan25-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan50-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan50-dhcp"
"710-ens4f1-vlan20-dhcp"
"710-ens4f1-vlan25-dhcp"
❯ oc get vms -A -o json | jq '.items[].spec.template.spec.networks[].multus.networkName' | wc -l
26
Method: jsonpath
What if you don't have jq
installed on your system? No problem - we got you covered. You can use the following for oc
or kubectl
based commands as well:
❯ oc get vms -A -o=jsonpath='{range .items[*]}{range .spec.template.spec.networks[*]}{.multus.networkName}{"\n"}{end}'
jinkit-vms/710-ens4f1-vlan4-dhcp
710-ens4f1-vlan20-dhcp
710-ens4f1-vlan20-dhcp
710-ens4f1-vlan25-dhcp
710-ens4f0-vlan55-dhcp
jinkit-vms/810-ens6f1-vlan4-dhcp
jinkit-vms/810-ens6f1-vlan4-dhcp
710-ens4f1-vlan20-dhcp
jinkit-vms/810-ens6f1-vlan4-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan50-dhcp
710-ens4f1-vlan40-dhcp
710-ens4f1-vlan20-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan55-dhcp
710-ens4f1-vlan45-dhcp
710-ens4f1-vlan25-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan50-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan50-dhcp
710-ens4f1-vlan20-dhcp
710-ens4f1-vlan25-dhcp
❯ oc get vms -A -o=jsonpath='{range .items[*]}{range .spec.template.spec.networks[*]}{.multus.networkName}{"\n"}{end}' | wc -l
26
The option is yours!