Photo by Sigmund / Unsplash

Using SR-IOV Interfaces in OpenShift Virtualization

sriov May 29, 2023

Author: Brandon B. Jozsa

"Nothing is impossible, the word itself says 'I'm Possible'."
Audrey Hepburn

Table of Contents

- Part I: Introduction
- Part II: Verify Supported Networking Hardware
- Part III: Installing the SR-IOV OpenShift Operator
- Part IV: Creating SR-IOV Policies and Networks
    - SR-IOV Network Node Policies
    - SR-IOV Networks
- Part V: Creating Virtual Machines
- Part VI: Troubleshooting Tips
    - Method: jq
    - Method: jsonpath

Part I: Introduction

Using SR-IOV interfaces in OpenShift is really quite simple. For advanced use-cases and implementation details, have a look at OpenShift's documentation (4.12 at the time of this writing). What I will cover in this article is how to implement SR-IOV on a SNO device, like in the case of a home lab or development environment.

Part II: Verify Supported Networking Hardware

The very first thing you need to do is verify your networking hardware. In general, you need to check a few things:

  • Check the list of OpenShift Container Platform SR-IOV supported NICs  documentation (at the time of writing 4.12)
  • Check for a list of supported devices/drivers for the VNF you wish to run on OpenShift. Palo Alto VM series and Fortinet VM series both have great support documentation, which includes version and driver support.
  • Ancillary equipment such as switches and cables
  • Architecture and design

Finally, as a small note to "home lab" or other folks who may be newer to networking: plan very carefully. Switches and cabling should be part of your overall budget planning, and these ancillary considerations can really add up. On a positive note, networking prices are starting to drop because of CPU/architecture changes in the market. You can find a Mellonox ConnectX-4 for a relatively decent price, but make sure that you have the correct switch, modules, DACs, etc. If you have any questions, feel free to reach me on Twitter for help.

Part III: Installing the SR-IOV OpenShift Operator

Now it's time to deploy the OpenShift Operator. This can be installed the same way that other OpenShift Operators are installed: using an OperatorGroup and a corresponding Subscription.

  1. First, create a namespace called openshift-sriov-network-operator. You can copy and paste the following to have this created for you:
    cat << EOF| oc create -f -
    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-sriov-network-operator
      annotations:
        workload.openshift.io/allowed: management
    EOF
    
  2. Next, create an OperatorGroup CR, based on the following YAML. Against you can copy and paste this "as-is":
    cat << EOF| oc create -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: sriov-network-operators
      namespace: openshift-sriov-network-operator
    spec:
      targetNamespaces:
      - openshift-sriov-network-operator
    EOF
    
  3. In order to create the subscription, you need the cluster version (major). This can be mapped to a variable for step 4 with the following command:
    OC_VERSION=$(oc version -o yaml | grep openshiftVersion | \
      grep -o '[0-9]*[.][0-9]*' | head -1)
    
  4. Lastly, create the subscription for the SR-IOV Operator with the following command:
    cat << EOF| oc create -f -
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: sriov-network-operator-subscription
      namespace: openshift-sriov-network-operator
    spec:
      channel: "${OC_VERSION}"
      name: sriov-network-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
    
    NOTE: Verify the installation progression with the following command:
    oc get csv -n openshift-sriov-network-operator \
      -o custom-columns=Name:.metadata.name,Phase:.status.phase
    

Part IV: Creating SR-IOV Policies and Networks

With the SR-IOV Operator successfully deployed, you will need to create two objects that work together before they can be leveraged as VFs (virtual functions) for your virtual machines (VNF): an SriovNetworkNodePolicy and an SriovNetwork.

SR-IOV Network Node Policies

Think of the SR-IOV Network Node Policy as a declaritive method for configuring the NIC for a given node. In order to properly configure this device, you need to know some things about your NIC and where the NIC is installed for your server.

For example, let's say that I have an Intel XXV710-DA2 card installed in the 4th PCIe slot on a Dell T640 server. The first thing that I do is either log into the iDRAC (or iLO, if you're using an HP system), or search within the Dell PowerEdge Technical Guide for the T640 Series Servers. In section 7, under "Networking and PCIe" there is a "PCIe expansion slots" subheading. This tells me that PCIe slot 4 is a Full Length slot which supports x8 PCIe lanes/ports on CPU 2. This means I must have 2 CPUs installed, and considering that the XXV710 card requires x8 lanes according to ark.intel.com, I'm in good shape.

Next, we need find the vendor and deviceID, so that we can add these items to our SriovNetworkNodePolicy manifest. The easiest way to find this is to use the following command:

next important thing to know is the interface you want to use for VFs. This is as simple as using the following (which we'll come back to in a moment):

[core@itamae ~]$ lspci -nn | egrep Ethernet | egrep 710
89:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 [8086:158b] (rev 02)
89:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 [8086:158b] (rev 02)
[core@itamae ~]$

The vendor is 8086 (for Intel), and the card deviceID is 158b.

The next important thing to know is the interface you want to use for VFs. This is simple, as our interface name is ens4f0. But we may want to use some VFs for one namespace/project, and use another group of VFs for another namespace. This isn't a problem, as we're going to assign half of our VFs to the project/namespace jinkit-vms.

So here's our manifest:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-sriov-710-ens4f0-0031
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  isRdma: false
  nicSelector:
    deviceID: 158b
    pfNames:
      - 'ens4f0#0-31'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: 'true'
  numVfs: 64
  priority: 97
  resourceName: 710_ens4f0_0031

The best way I can explain each of these fields is to show you the best oc or kubectl tool that you're probably not using:

❯ oc explain SriovNetworkNodePolicy.spec
KIND:     SriovNetworkNodePolicy
VERSION:  sriovnetwork.openshift.io/v1

RESOURCE: spec <Object>

DESCRIPTION:
     SriovNetworkNodePolicySpec defines the desired state of
     SriovNetworkNodePolicy

FIELDS:
   deviceType	<string>
     The driver type for configured VFs. Allowed value "netdevice", "vfio-pci".
     Defaults to netdevice.

   eSwitchMode	<string>
     NIC Device Mode. Allowed value "legacy","switchdev".

   isRdma	<boolean>
     RDMA mode. Defaults to false.

   linkType	<string>
     NIC Link Type. Allowed value "eth", "ETH", "ib", and "IB".

   mtu	<integer>
     MTU of VF

   needVhostNet	<boolean>
     mount vhost-net device. Defaults to false.

   nicSelector	<Object> -required-
     NicSelector selects the NICs to be configured

   nodeSelector	<map[string]string> -required-
     NodeSelector selects the nodes to be configured

   numVfs	<integer> -required-
     Number of VFs for each PF

   priority	<integer>
     Priority of the policy, higher priority policies can override lower ones.

   resourceName	<string> -required-
     SRIOV Network device plugin endpoint resource name

Now let's talk about SR-IOV Networks, because these are going to be namespaced objects which can be used in multi-tenant environments.

SR-IOV Networks

Put simply, an SriovNetwork is a namespaced object (so think multi-tenancy) that maps to a Network Attachment Definition, which in turn can be used for a virtual machine. This needs to be bound to the resourceName field on the SriovNetworkNodePolicy we created earlier - so in our example above, this is 710_ens4f0_0031. This also means that administrators for the jinkit-vms namespace have VFs 0-31 available (per the pfNames ens4f0#0-31).

Another thing that the SriovNetwork does is configure the IPAM type for the Network Attachment Definition. In the example below, we're saying that there's a DHCP server on VLAN 55:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: 710-ens4f0-vlan55-dhcp
  namespace: openshift-sriov-network-operator
spec:
  ipam: |-
    {
      "ipam": {
        "type": "dhcp"
      }
    }
  networkNamespace: jinkit-vms
  resourceName: 710_ens4f0_0031
  vlan: 55

Part V: Creating Virtual Machines

Now we're on to the most important part, right? We need to create a virtual machine, and ensure that this virtual machine can claim a VF that's been prepared by the SriovNetworkNodePolicy and SriovNetwork.

The important thing to remeber is that the virtual machine will have two references for networking. The first is going to be under the two manifests paths:

  • virtualmachine.spec.template.spec.networks
  • virtualmachine.spec.template.spec.domain.devices.interfaces

The mapping looks like the following:

spec:
  template:
    spec:
      domain:
        devices:
          interfaces:
            - macAddress: '02:9b:48:00:10:03'
              model: virtio
              name: fortigate-test-02-port1
              sriov: {}
            - macAddress: '02:9b:48:00:10:04'
              model: virtio
              name: fortigate-test-02-port2
              sriov: {}
            - macAddress: '02:9b:48:00:10:05'
              model: virtio
              name: fortigate-test-02-port3
              sriov: {}
            - macAddress: '02:9b:48:00:10:44'
              model: virtio
              name: fortigate-test-02-port4
              sriov: {}
      networks:
        - multus:
            networkName: 710-ens4f0-vlan90-dhcp
          name: fortigate-test-02-port1
        - multus:
            networkName: 710-ens4f0-vlan50-dhcp
          name: fortigate-test-02-port2
        - multus:
            networkName: 710-ens4f1-vlan40-dhcp
          name: fortigate-test-02-port3
        - multus:
            networkName: 710-ens4f1-vlan20-dhcp
          name: fortigate-test-02-port4

As you can see, the manifest above is for VFs that are generated for a Fortigate VM series NGFW VNF.

Part VI: Troubleshooting Tips

Let's say that you're working with a cluster that has several, or even dozens of virtual machines, but you want to look for only the associated SR-IOV VFs. I can at least help you get started with a command.

Method: jq

If you're using the popular tool jq, then you can use the following method of searching for named VFs that are deployed in your cluster (look closely at the path):

❯ oc get vms -A -o json | jq '.items[].spec.template.spec.networks[].multus.networkName'
"jinkit-vms/710-ens4f1-vlan4-dhcp"
"710-ens4f1-vlan20-dhcp"
"710-ens4f1-vlan20-dhcp"
"710-ens4f1-vlan25-dhcp"
"710-ens4f0-vlan55-dhcp"
"jinkit-vms/810-ens6f1-vlan4-dhcp"
"jinkit-vms/810-ens6f1-vlan4-dhcp"
"710-ens4f1-vlan20-dhcp"
"jinkit-vms/810-ens6f1-vlan4-dhcp"
null
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan50-dhcp"
"710-ens4f1-vlan40-dhcp"
"710-ens4f1-vlan20-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan55-dhcp"
"710-ens4f1-vlan45-dhcp"
"710-ens4f1-vlan25-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan50-dhcp"
"710-ens4f0-vlan90-dhcp"
"710-ens4f0-vlan50-dhcp"
"710-ens4f1-vlan20-dhcp"
"710-ens4f1-vlan25-dhcp"

❯ oc get vms -A -o json | jq '.items[].spec.template.spec.networks[].multus.networkName' | wc -l
26

Method: jsonpath

What if you don't have jq installed on your system? No problem - we got you covered. You can use the following for oc or kubectl based commands as well:

❯ oc get vms -A -o=jsonpath='{range .items[*]}{range .spec.template.spec.networks[*]}{.multus.networkName}{"\n"}{end}'
jinkit-vms/710-ens4f1-vlan4-dhcp
710-ens4f1-vlan20-dhcp
710-ens4f1-vlan20-dhcp
710-ens4f1-vlan25-dhcp
710-ens4f0-vlan55-dhcp
jinkit-vms/810-ens6f1-vlan4-dhcp
jinkit-vms/810-ens6f1-vlan4-dhcp
710-ens4f1-vlan20-dhcp
jinkit-vms/810-ens6f1-vlan4-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan50-dhcp
710-ens4f1-vlan40-dhcp
710-ens4f1-vlan20-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan55-dhcp
710-ens4f1-vlan45-dhcp
710-ens4f1-vlan25-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan50-dhcp
710-ens4f0-vlan90-dhcp
710-ens4f0-vlan50-dhcp
710-ens4f1-vlan20-dhcp
710-ens4f1-vlan25-dhcp

❯ oc get vms -A -o=jsonpath='{range .items[*]}{range .spec.template.spec.networks[*]}{.multus.networkName}{"\n"}{end}' | wc -l
26

The option is yours!

Tags