Change the Rotational Status of HDDs in OpenShift 4.17 (ODF)

Author: Brandon B. Jozsa

"The world hate change, yet it is the only thing that has brought progress."
- Charles Kittering

Table of Contents

- Part I: Introduction
- Part II: Gathering Drive Information
    - Preparation: Gather information about your environment
- Part III: Creating a MachineConfig
    - Using Butane to Create a MachineConfig
    - Verification
- Part IV: Conclusion

Part I: Introduction

The other day I was working on OpenShift 4.17, and attempting to install the OpenShift Data Foundation (ODF) operator. There was a subtle change made that wasn't called out in the release notes, so I wanted to discuss it here for a moment.

In OpenShift Data Foundation 4.17, for new deployments (this is key, really), you can no longer install the operator on rotational disks. To be fair, ODF always had this requirement, but it would let you continue to install the operator on rotational disks anyway, with the understanding that Red Hat would not support issues related to storage latency, etc. This should be fairly obvious, but always run ODF on non-rotational drives, such as SSD or preverably NVMe. It should be pointed out that there are scenarios where you can run ODF on SAN presented disks as well, but this is beyond the scope of what I want to discuss today.

But let me discuss my lab scenario, and why the workaround I am presenting today is acceptable, specifically for my testing purposes. I am running an OpenShift SNO deployment on a Dell T550, with Dell-branded Micron 9200 Pro U.2 drives. These are very capable drives, and well within the specification for running ODF.

However, I am running OpenShift Virtualization within this SNO environment, and because of this, any of the Virtual Machines running within this OpenShift environment will be presented from QEMU as Rotational drives.

Recently, I started deploying OpenShift on top of OpenShift Virtualization, so I can maximize my hardware and so I can wholly rely on OpenShift as my primary virtualization platform for 2025. (I will create a blog post on this very soon, so stay tuned).

So you may see where this is going. On my host OpenShift environment, I am deploying a guest OpenShift environment, and within that guest environment I want to deploy ODF and yes, OpenShift virtualization (along with some amazing network scenarios I've been working on - again, more to come on this soon).

So in summary, the disks are fast - I don't have to worry about this. But they're being labelled as Rotational disks, when in fact they're actually non-rotational.

I want to change that, and I'll tell you how to do this as well in the next section.

Part II: Gathering Drive Information

The first thing we're going to do is collect a bunch of useful information about our disks. You may not need all of this, but it's always good practice to know you're environment before you start changing things. I can't tell you how many times, a customer knew their environment well, but inevitably something unexpectedly changed and a review of the environment prevented a potential disruption with their proof of concept. Leave nothing to chance.

Preparation: Gather information about your environment

  1. Collect information about your environment.

    ❯ oc get nodes
    NAME       STATUS   ROLES                         AGE   VERSION
    cp0        Ready    control-plane,master,worker   66d   v1.29.8+f10c92d
    cp1        Ready    control-plane,master,worker   66d   v1.29.8+f10c92d
    cp2        Ready    control-plane,master,worker   66d   v1.29.8+f10c92d
    
  2. Have a look at each of the nodes associated physical disks.

    NODE_NAME=node/cp0
    
    cat <<EOF | oc debug $NODE_NAME
    chroot /host
    lsblk -o NAME,ROTA,SIZE,TYPE
    EOF
    

    The output will look similar to the following:

    sh-5.1# lsblk -o NAME,ROTA,SIZE,TYPE
    sh-5.1# exit
    NAME   ROTA   SIZE TYPE
    sda       1   1.8T disk
    sdb       1   5.5T disk
    sdc       1   7.3T disk
    sdd       1   3.6T disk
    sde       1   931G disk
    |-sde1    1     1M part
    |-sde2    1   127M part
    |-sde3    1   384M part
    `-sde4    1 930.5G part
    sr0       1  1024M rom
    
    Removing debug pod ...
    
  3. Now I want you to run the following command, which will list out all of the mappings between disk assignments (i.e. /dev/sdc) and disk by-path (i.e. /dev/disk/by-path/).

    NODE_NAME=node/cp0
    
    cat <<EOF | oc debug $NODE_NAME
    chroot /host
    ls -aslc /dev/disk/by-path/
    EOF
    

    The output will look like the following:

    sh-5.1# ls -aslc /dev/disk/by-path/
    sh-5.1# exit
    total 0
    0 drwxr-xr-x. 2 root root 260 Sep  7 00:30 .
    0 drwxr-xr-x. 9 root root 180 Sep  7 00:30 ..
    0 lrwxrwxrwx. 1 root root   9 Sep  7 00:30 pci-0000:00:17.0-ata-8 -> ../../sr0
    0 lrwxrwxrwx. 1 root root   9 Sep  7 00:30 pci-0000:00:17.0-ata-8.0 -> ../../sr0
    0 lrwxrwxrwx. 1 root root   9 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:0:0 -> ../../sda
    0 lrwxrwxrwx. 1 root root   9 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:1:0 -> ../../sdc
    0 lrwxrwxrwx. 1 root root   9 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:2:0 -> ../../sdb
    0 lrwxrwxrwx. 1 root root   9 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:3:0 -> ../../sdd
    0 lrwxrwxrwx. 1 root root   9 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0 -> ../../sde
    0 lrwxrwxrwx. 1 root root  10 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0-part1 -> ../../sde1
    0 lrwxrwxrwx. 1 root root  10 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0-part2 -> ../../sde2
    0 lrwxrwxrwx. 1 root root  10 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0-part3 -> ../../sde3
    0 lrwxrwxrwx. 1 root root  10 Sep  7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0-part4 -> ../../sde4
    
    Removing debug pod ...
    
  4. Lastly, if you want to collect read/write IOP information about your drives, in order to determine if you're within specification to deploy ODF on your "Rotational" drives, you can follow the Github/Gist link HERE which will walk you through gathering this information.

Part III: Creating a MachineConfig

If you haven't created a MachineConfig yet, it can seem a bit confusing at first. So let me explain this at a very high level, and then I will get into the details for what we need to change below.

As you probably have heard before, RHCOS is a immutable, container-focused operating system. This means that the core OS is read-only, and an operator (called the "Machine Config Operator"). The MCO controls the state of the OS at all times. So if someone creates a breaking change in the OS, the MCO can put that OS in a working last known state. This helps organizations make sweeping changes across their organization centrally, and in a safe way. Red Hat treats RHCOS more like an appliance than a traditional OS like RHEL. When I say that the OS is a container-focused OS, I also mean that there is no package manager like Yum or DNF. So then, how would you get packages or make changes to the OS?

This is where Machine Configs come into play. With MCs, you can create files, add binary utilities, create systemd units, etc.

This is what we're going to leverage, in order to create a boot/watchdog service that can make our HDD devices look like NonRotational devices for ODF.

Using Butane to Create a MachineConfig

Source Documentation: HERE

  1. The first thing I would like you to do is download a Red Hat utility called butane. I'm going to be installing the ARM-based MacOS binary, but be sure to change the ARCH variable to something more useful if required for your system.

    ARCH="darwin-amd64"
    
    mkdir -p ~/.local/bin/
    curl https://mirror.openshift.com/pub/openshift-v4/clients/butane/latest/butane-$ARCH --output ~/.local/bin/butane
    chmod +x ~/.local/bin/butane
    
    butane --version
    
  2. Now, have a look at the docs for Butane, however I'm going to show you everything you need to do right here. We're going to use the butane utility to create a custom MachineConfig that we can apply to each of our nodes. So let's create a file called 99-fake-nonrotational-mc.bu.

    NOTE: You do need to pay attention to the label that I am targetting in my example below. I am targetting machineconfiguration.openshift.io/role: master because I am running this MachineConfig against a compact cluster (3 schdulable control-plane nodes). If you have dedicated infra nodes, or are using a full cluster deployment, change your label accordingly.

    cat << 'EOF' > 99-fake-nonrotational-mc.bu
    variant: openshift
    version: 4.17.0
    metadata:
      name: 99-fake-nonrotational
      labels:
        machineconfiguration.openshift.io/role: master
    storage:
      files:
        - path: /etc/fake-nonrotational.sh
          mode: 0755
          contents:
            inline: |
              #!/bin/bash
    
              # Find disks with a size of 120G:
              target_disks=$(lsblk -dn -o NAME,SIZE | awk '$2 == "120G" {print $1}')
    
              # Mark these disks as NonRotational:
              for disk in $target_disks; do
                  echo "Changing disk: /dev/$disk to non-rotational."
                  echo 0 > /sys/block/$disk/queue/rotational
              done
    systemd:
      units:
        - name: fake-nonrotational.service
          enabled: true
          contents: |
            [Unit]
            Description=Force specific-size disks to be nonrotational
            After=local-fs.target
            Wants=local-fs.target
    
            [Service]
            Type=oneshot
            ExecStart=/etc/fake-nonrotational.sh
            Restart=on-failure
            RestartSec=5s
            RemainAfterExit=true
            User=root
    
            [Install]
            WantedBy=multi-user.target
    EOF
    

    What you'll notice is that we have the script embedded in our butane .bu file (see the example below). If you want to make changes to the script for your use case; make these changes now.

    #!/bin/bash
    
    # Find disks with a size of 120G:
    target_disks=$(lsblk -dn -o NAME,SIZE | awk '$2 == "120G" {print $1}')
    
    # Mark that disk as NonRotational:
    for disk in $target_disks; do
        echo "Changing disk: /dev/$disk to non-rotational."
        echo 0 > /sys/block/$disk/queue/rotational
    done
    

    My script example above is doing the following:

    • Look for a disk with 120G (exactly)
    • Change the disk from a Rotational disk to a NonRotational disk by editing the value in /sys/block/$disk/queue/rotational (but only for the disk that matches 120G
  3. Now you can run butane to create the MachineConfig by using the following command.

    butane 99-fake-nonrotational-mc.bu -o 99-fake-nonrotational-mc.yaml
    
  4. Below was the final result of my butane results. I am simply going to add an annotations.description to my MachiineConfig, but after I've done that I'm going to apply the manifest to my cluster like so.

    cat <<EOF | oc apply -f -
    ---
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: master
      annotations:
        description: "MC sets disks with size 120G to non-rotational. Documentation found at: https://tinyurl.com/mc-nonrotational-hack"
      name: 99-fake-nonrotational
    spec:
      config:
        ignition:
          version: 3.4.0
        storage:
          files:
            - contents:
                compression: gzip
                source: data:;base64,H4sIAAAAAAAC/0zNvU7DQBDE8X6fYnBOCkiYTVKCjIRQQBTQ0NGgc+5ir2ztgu9CxNe7I44i9P/fzOyIW1FufeqJZrgRDQiShoS95B4eST4ibIvlanFL2U9dzM8laNzxmNpxQB0UteHh6n59+nj3tMYX/H7A3K3QNKh+YYXPl0k0wy2/5ydEW5vKC0Th/o9eIBgBQNz0huq699qJdiU+B4f4xq7AbFDTerLss5j68aw6uAUuwek9cTvaZvgT/LqLu8gHQME00k8AAAD//0gklW0AAQAA
              mode: 493
              path: /etc/fake-nonrotational.sh
        systemd:
          units:
            - contents: |
                [Unit]
                Description=Force specific-size disks to be nonrotational
                After=local-fs.target
                Wants=local-fs.target
    
                [Service]
                Type=oneshot
                ExecStart=/etc/fake-nonrotational.sh
                Restart=on-failure
                RestartSec=5s
                RemainAfterExit=true
                User=root
    
                [Install]
                WantedBy=multi-user.target
              enabled: true
              name: fake-nonrotational.service
    EOF
    

Verification

  1. To verify that things are working, use the following command against one fo your nodes where the MC was applied.

    NODE_NAME=node/cp0
    
    cat <<EOF | oc debug $NODE_NAME
      cat /sys/block/sda/queue/rotational
      cat /sys/block/sdb/queue/rotational
    EOF
    

    You can also check the journald logs as well, since this was created as a service.

    NODE_NAME=node/cp0
    
    cat <<EOF | oc debug $NODE_NAME
    chroot /host
    
    journalctl | grep -e fake-nonrotational
    EOF
    

Part IV: Conclusion

So, there you have it! With Linux, you pretty much control your own destiny. I love little hacks like this, because it enables people to get the most out of their home lab environments. And for larger organizations, the more that people learn in safe lab environments, the more confident they can be when working on production environments.

I hope you've enjoyed today little blog post. I was pretty happy in getting this to work, and I'm even more happy to pass this information along to others!

It's the holiday season (Dec 21st 2024). Enjoy the time with your family and friends! Until the next post, take care everyone.

- v1k0d3n (Brandon B. Jozsa)