Change the Rotational Status of HDDs in OpenShift 4.17 (ODF)
Author: Brandon B. Jozsa
"The world hate change, yet it is the only thing that has brought progress."
- Charles Kittering
- Part I: Introduction
- Part II: Gathering Drive Information
- Preparation: Gather information about your environment
- Part III: Creating a MachineConfig
- Using Butane to Create a MachineConfig
- Verification
- Part IV: Conclusion
Part I: Introduction
The other day I was working on OpenShift 4.17, and attempting to install the OpenShift Data Foundation (ODF) operator. There was a subtle change made that wasn't called out in the release notes, so I wanted to discuss it here for a moment.
In OpenShift Data Foundation 4.17, for new deployments (this is key, really), you can no longer install the operator on rotational disks. To be fair, ODF always had this requirement, but it would let you continue to install the operator on rotational disks anyway, with the understanding that Red Hat would not support issues related to storage latency, etc. This should be fairly obvious, but always run ODF on non-rotational drives, such as SSD or preverably NVMe. It should be pointed out that there are scenarios where you can run ODF on SAN presented disks as well, but this is beyond the scope of what I want to discuss today.
But let me discuss my lab scenario, and why the workaround I am presenting today is acceptable, specifically for my testing purposes. I am running an OpenShift SNO deployment on a Dell T550, with Dell-branded Micron 9200 Pro U.2 drives. These are very capable drives, and well within the specification for running ODF.
However, I am running OpenShift Virtualization within this SNO environment, and because of this, any of the Virtual Machines running within this OpenShift environment will be presented from QEMU as Rotational
drives.
Recently, I started deploying OpenShift on top of OpenShift Virtualization, so I can maximize my hardware and so I can wholly rely on OpenShift as my primary virtualization platform for 2025. (I will create a blog post on this very soon, so stay tuned).
So you may see where this is going. On my host OpenShift environment, I am deploying a guest OpenShift environment, and within that guest environment I want to deploy ODF and yes, OpenShift virtualization (along with some amazing network scenarios I've been working on - again, more to come on this soon).
So in summary, the disks are fast - I don't have to worry about this. But they're being labelled as Rotational
disks, when in fact they're actually non-rotational.
I want to change that, and I'll tell you how to do this as well in the next section.
Part II: Gathering Drive Information
The first thing we're going to do is collect a bunch of useful information about our disks. You may not need all of this, but it's always good practice to know you're environment before you start changing things. I can't tell you how many times, a customer knew their environment well, but inevitably something unexpectedly changed and a review of the environment prevented a potential disruption with their proof of concept. Leave nothing to chance.
Preparation: Gather information about your environment
-
Collect information about your environment.
❯ oc get nodes NAME STATUS ROLES AGE VERSION cp0 Ready control-plane,master,worker 66d v1.29.8+f10c92d cp1 Ready control-plane,master,worker 66d v1.29.8+f10c92d cp2 Ready control-plane,master,worker 66d v1.29.8+f10c92d
-
Have a look at each of the nodes associated physical disks.
NODE_NAME=node/cp0 cat <<EOF | oc debug $NODE_NAME chroot /host lsblk -o NAME,ROTA,SIZE,TYPE EOF
The output will look similar to the following:
sh-5.1# lsblk -o NAME,ROTA,SIZE,TYPE sh-5.1# exit NAME ROTA SIZE TYPE sda 1 1.8T disk sdb 1 5.5T disk sdc 1 7.3T disk sdd 1 3.6T disk sde 1 931G disk |-sde1 1 1M part |-sde2 1 127M part |-sde3 1 384M part `-sde4 1 930.5G part sr0 1 1024M rom Removing debug pod ...
-
Now I want you to run the following command, which will list out all of the mappings between disk assignments (i.e.
/dev/sdc
) and disk by-path (i.e./dev/disk/by-path/
).NODE_NAME=node/cp0 cat <<EOF | oc debug $NODE_NAME chroot /host ls -aslc /dev/disk/by-path/ EOF
The output will look like the following:
sh-5.1# ls -aslc /dev/disk/by-path/ sh-5.1# exit total 0 0 drwxr-xr-x. 2 root root 260 Sep 7 00:30 . 0 drwxr-xr-x. 9 root root 180 Sep 7 00:30 .. 0 lrwxrwxrwx. 1 root root 9 Sep 7 00:30 pci-0000:00:17.0-ata-8 -> ../../sr0 0 lrwxrwxrwx. 1 root root 9 Sep 7 00:30 pci-0000:00:17.0-ata-8.0 -> ../../sr0 0 lrwxrwxrwx. 1 root root 9 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:0:0 -> ../../sda 0 lrwxrwxrwx. 1 root root 9 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:1:0 -> ../../sdc 0 lrwxrwxrwx. 1 root root 9 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:2:0 -> ../../sdb 0 lrwxrwxrwx. 1 root root 9 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:3:0 -> ../../sdd 0 lrwxrwxrwx. 1 root root 9 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0 -> ../../sde 0 lrwxrwxrwx. 1 root root 10 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0-part1 -> ../../sde1 0 lrwxrwxrwx. 1 root root 10 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0-part2 -> ../../sde2 0 lrwxrwxrwx. 1 root root 10 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0-part3 -> ../../sde3 0 lrwxrwxrwx. 1 root root 10 Sep 7 00:30 pci-0000:1a:00.0-scsi-0:2:4:0-part4 -> ../../sde4 Removing debug pod ...
-
Lastly, if you want to collect read/write IOP information about your drives, in order to determine if you're within specification to deploy ODF on your "
Rotational
" drives, you can follow the Github/Gist link HERE which will walk you through gathering this information.
Part III: Creating a MachineConfig
If you haven't created a MachineConfig
yet, it can seem a bit confusing at first. So let me explain this at a very high level, and then I will get into the details for what we need to change below.
As you probably have heard before, RHCOS is a immutable, container-focused operating system. This means that the core OS is read-only, and an operator (called the "Machine Config Operator"). The MCO controls the state of the OS at all times. So if someone creates a breaking change in the OS, the MCO can put that OS in a working last known state. This helps organizations make sweeping changes across their organization centrally, and in a safe way. Red Hat treats RHCOS more like an appliance than a traditional OS like RHEL. When I say that the OS is a container-focused OS, I also mean that there is no package manager like Yum or DNF. So then, how would you get packages or make changes to the OS?
This is where Machine Configs come into play. With MCs, you can create files, add binary utilities, create systemd units, etc.
This is what we're going to leverage, in order to create a boot/watchdog service that can make our HDD devices look like NonRotational
devices for ODF.
Using Butane to Create a MachineConfig
Source Documentation: HERE
-
The first thing I would like you to do is download a Red Hat utility called
butane
. I'm going to be installing the ARM-based MacOS binary, but be sure to change theARCH
variable to something more useful if required for your system.ARCH="darwin-amd64" mkdir -p ~/.local/bin/ curl https://mirror.openshift.com/pub/openshift-v4/clients/butane/latest/butane-$ARCH --output ~/.local/bin/butane chmod +x ~/.local/bin/butane butane --version
-
Now, have a look at the docs for Butane, however I'm going to show you everything you need to do right here. We're going to use the
butane
utility to create a customMachineConfig
that we can apply to each of our nodes. So let's create a file called99-fake-nonrotational-mc.bu
.NOTE: You do need to pay attention to the label that I am targetting in my example below. I am targetting
machineconfiguration.openshift.io/role: master
because I am running thisMachineConfig
against a compact cluster (3 schdulable control-plane nodes). If you have dedicatedinfra
nodes, or are using a full cluster deployment, change your label accordingly.cat << 'EOF' > 99-fake-nonrotational-mc.bu variant: openshift version: 4.17.0 metadata: name: 99-fake-nonrotational labels: machineconfiguration.openshift.io/role: master storage: files: - path: /etc/fake-nonrotational.sh mode: 0755 contents: inline: | #!/bin/bash # Find disks with a size of 120G: target_disks=$(lsblk -dn -o NAME,SIZE | awk '$2 == "120G" {print $1}') # Mark these disks as NonRotational: for disk in $target_disks; do echo "Changing disk: /dev/$disk to non-rotational." echo 0 > /sys/block/$disk/queue/rotational done systemd: units: - name: fake-nonrotational.service enabled: true contents: | [Unit] Description=Force specific-size disks to be nonrotational After=local-fs.target Wants=local-fs.target [Service] Type=oneshot ExecStart=/etc/fake-nonrotational.sh Restart=on-failure RestartSec=5s RemainAfterExit=true User=root [Install] WantedBy=multi-user.target EOF
What you'll notice is that we have the script embedded in our
butane
.bu
file (see the example below). If you want to make changes to the script for your use case; make these changes now.#!/bin/bash # Find disks with a size of 120G: target_disks=$(lsblk -dn -o NAME,SIZE | awk '$2 == "120G" {print $1}') # Mark that disk as NonRotational: for disk in $target_disks; do echo "Changing disk: /dev/$disk to non-rotational." echo 0 > /sys/block/$disk/queue/rotational done
My script example above is doing the following:
- Look for a disk with 120G (exactly)
- Change the disk from a
Rotational
disk to aNonRotational
disk by editing the value in/sys/block/$disk/queue/rotational
(but only for the disk that matches 120G
-
Now you can run
butane
to create theMachineConfig
by using the following command.butane 99-fake-nonrotational-mc.bu -o 99-fake-nonrotational-mc.yaml
-
Below was the final result of my
butane
results. I am simply going to add anannotations.description
to myMachiineConfig
, but after I've done that I'm going to apply the manifest to my cluster like so.cat <<EOF | oc apply -f - --- apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master annotations: description: "MC sets disks with size 120G to non-rotational. Documentation found at: https://tinyurl.com/mc-nonrotational-hack" name: 99-fake-nonrotational spec: config: ignition: version: 3.4.0 storage: files: - contents: compression: gzip source: data:;base64,H4sIAAAAAAAC/0zNvU7DQBDE8X6fYnBOCkiYTVKCjIRQQBTQ0NGgc+5ir2ztgu9CxNe7I44i9P/fzOyIW1FufeqJZrgRDQiShoS95B4eST4ibIvlanFL2U9dzM8laNzxmNpxQB0UteHh6n59+nj3tMYX/H7A3K3QNKh+YYXPl0k0wy2/5ydEW5vKC0Th/o9eIBgBQNz0huq699qJdiU+B4f4xq7AbFDTerLss5j68aw6uAUuwek9cTvaZvgT/LqLu8gHQME00k8AAAD//0gklW0AAQAA mode: 493 path: /etc/fake-nonrotational.sh systemd: units: - contents: | [Unit] Description=Force specific-size disks to be nonrotational After=local-fs.target Wants=local-fs.target [Service] Type=oneshot ExecStart=/etc/fake-nonrotational.sh Restart=on-failure RestartSec=5s RemainAfterExit=true User=root [Install] WantedBy=multi-user.target enabled: true name: fake-nonrotational.service EOF
Verification
-
To verify that things are working, use the following command against one fo your nodes where the MC was applied.
NODE_NAME=node/cp0 cat <<EOF | oc debug $NODE_NAME cat /sys/block/sda/queue/rotational cat /sys/block/sdb/queue/rotational EOF
You can also check the
journald
logs as well, since this was created as a service.NODE_NAME=node/cp0 cat <<EOF | oc debug $NODE_NAME chroot /host journalctl | grep -e fake-nonrotational EOF
Part IV: Conclusion
So, there you have it! With Linux, you pretty much control your own destiny. I love little hacks like this, because it enables people to get the most out of their home lab environments. And for larger organizations, the more that people learn in safe lab environments, the more confident they can be when working on production environments.
I hope you've enjoyed today little blog post. I was pretty happy in getting this to work, and I'm even more happy to pass this information along to others!
It's the holiday season (Dec 21st 2024). Enjoy the time with your family and friends! Until the next post, take care everyone.
- v1k0d3n (Brandon B. Jozsa)