ZTP for OpenShift Alpha Clusters

OpenShift Aug 20, 2022

Author: Brandon B. Jozsa

"I hear and I forget. I see and I remember. I do and I understand."
- Confucius

Table of Contents

- Part I: Introduction
- Part II: Re-thinking Bare Metal Deployments
    - Proof of Concept Overview
        - Platform Operations
        - Field Technicians
        - Basic Workflow (Overview)
- Part III: PoC Prerequisites
    - Download the openshift-install Binary
    - RHEL-based Prerequisite Packages
- Part IV: Demonstration
    - Step 1: Create Deployment Manifests
    - Step 2: Generate the OpenShift Installation Media
    - Step 3: Boot OpenShift Installation Media
    - Post-Deployment Troubleshooting
    - Accessing the Cluster
- Final Thoughts
- Appendix

Part I: Introduction

I'm so excited for what this year is going to bring for OpenShift users, and today I want to give you a small peek into what our incredible project management and engineering teams have been up to recently.

If you've ever deployed an OpenShift cluster within a cloud provider using an Installer-Provisioned Infrastructure (IPI) deployment method, then you are mostly likely familiar with a utility called openshift-install. There are some very useful and interesting changes coming to this utility with respect to how bare metal can be provisioned. As I describe this functionality,  you can think of edge-based scenarios such as RAN or MEC deployments.

A large service provider will have thousands of compute endpoints deployed to the edge for 5G. As they look to disaggregate their infrastructure and use solutions like OpenShift/Kubernetes, the deployments still need to be:

  • Reliable
  • Repeatable
  • Cost-Efficient
  • Easy to Install

As service providers continue to evolve and migrate their deployments to Cloud Native patterns, there is one question that I always want to ask:

"Does the overall solution expect your RF Field Engineers to become Kubernetes experts, or has the solution evolved to be more user-friendly for the RF Field Engineers?"

Essentially, I'm asking them if they've really considered the overall cost of migrating to a Cloud Native model. This is an extremely important question, because when there's an issue with something like a RAN device, ideally you want the field technician to focus on the things they know best - troubleshooting radio/RF. Do you really want to walk them through troubleshooting a complex platform in real-time? Do you want to re-train all your RF Engineers, and what does this even look like in terms of cost and overall effort? A Kubernetes platform is supposed to make things easier, not get in the way, make things more complex, or force you to retrain your workforce. So then, how can we make this process reliable, repeatable, cost-effective, and still easy to install? As I always like to do, let's show this with a demonstration.

Part II: Re-thinking Bare Metal Deployments

There's a new sub-command coming to the openshift-install binary called agent, and this is the new feature that I'll be demonstrating today. I've pre-compiled a Linux version of the binary for you (disclaimer coming in the next section), but if you'd rather build it yourself I will include a link in the Appendix at the bottom of this post which will describe how you can build the binary for yourself. For this post, I want to make things as quick and easy as possible.

Proof of Concept Overview

DISCLAIMER: The work described in this blog post is dev-preview at this time (2022/08/20). Keep this in mind as you work through each of the following steps. If you have any issues with the work described below, please feel free to send me an EMAIL.

Also, I want you to think about two distinct groups as you work through this PoC:

  • Platform Operations: A team that provides platform details and manifests
  • Field Technicians: A team tasked with installing the OpenShift platform in the field

Platform Operations

The Platform Operations team could be anyone who is providing details for the platform installation. They could provide the initial manifests which will be used to generate the ISO, or they could simply provide the resulting ISO. This could actually be an automated process, depending on how a given service provider implement the overall solution. It doesn't have to be a person performing the work, but keep in mind that somehow the resulting manifests or ISO artifact will need to be given to a Field Technician for the remote site installation.

Field Technicians

A Field Technician is simply remote hands. In most cases, this will likely be an RF Field Engineer or Technician, but in practice they can be fairly non-technical for our purposes. We're not asking field technicians to be OpenShift or Kubernetes experts, and there's no additional retraining required to install the OpenShift platform at a remote site. The goal of this demonstration is to demonstrate that anyone can install OpenShift, so-long as they can boot the installation media (either virtually, or through physical media).

Basic Workflow (Overview)

Now that we've discussed the targeted audience for our installation, I want to give a basic overview of the workflow. More details are covered in following sections, where you can perform this work yourself (as an example). But there are really only 3 primary steps after some initial setup:

  1. Generate ISO installation media (from a single command)
  2. Hand over the resulting ISO image to a Field Technician
  3. Boot the installation media, and wait until installation is complete

We're going to cover any prerequisites for the openshift-install application first. Once installed correctly, there are very few requirements to get started. So let's cover this initial, one-time setup now.

Part III: PoC Prerequisites

Download the openshift-install Binary

First, download the openshift-install binary from HERE. Be sure that the binary is in your PATH. You might notice that I am using a python utility called gdown. This utility can make it a little easier to download the binary directly to your local RHEL-based system for testing. You're welcome to use whatever method that works best for you. The following commands should work out of the box.

gdown 1-rJzzyVa9rWvw3yeAuGsTfaaSFLcMY6x
tar zxvf openshift-install.tgz
sudo chmod +x openshift-install
sudo chown root:root openshift-install
sudo mv openshift-install /usr/local/bin/openshift-install
PATH=$PATH:/usr/local/bin

RHEL-based Prerequisite Packages

Once you've downloaded the binary (as described above), make sure to install the following package requirements on your RHEL-based target system. These packages are required. genisoimage will be leveraged to create the resulting ISO artifact, and nmstate will be used to read the networking instructions from an agent.yaml file. I'll provide more information about the agent.yaml later (below).

sudo yum install genisoimage nmstate -yy

Part IV: Demonstration

Step 1: Create Deployment Manifests

Now we are ready to create the manifests used for our deployment. The first thing I'd like you to do is to create two directories. The first directory, cluster-manifests, will be used as our working directory. The second directory, cluster-manifests-backup, will be used to store copies of our YAML manifests (as a backup directory). This is because when openshift-install runs it takes the YAML configuration files offered, and converts them into deployable manifests that are then added to the resulting ISO. So once you have a successfully built ISO, your original configuration files will no longer be at the root of the cluster-manifests directory. It's always nice to keep a backup of your original YAML manifests.

Let's create these directories for now, and you will understand this behavior better once you see it in action.

mkdir -p {./cluster-manifests/,./cluster-manifests-backup/}

Before customizing the following manifests (install-config.yaml and agent-config.yaml), I want to start you off with manifests that I am absolutely certain work. My goal with this article is to make sure you succeed in creating a bootable ISO image, so for now, use the exact same manifests as a baseline test. I will walk you through customizing this for your environment after you're able to successfully build your first bootable ISO.

Create manifest: install-config.yaml

NOTE: You will be writing the install-config.yaml file to the cluster-manifests directory. So make sure you're in the base directory where you created the two directories called cluster-manifests and cluster-manifests-backup.

[root@aio tmp]# mkdir -p {./cluster-manifests/,./cluster-manifests-backup/}
[root@aio tmp]# ls -asl
total 0
0 drwxr-xr-x. 4 root root  63 Aug 21 12:29 .
0 drwxr-xr-x. 8 root root 190 Aug 21 12:28 ..
0 drwxr-xr-x. 2 root root   6 Aug 21 12:29 cluster-manifests
0 drwxr-xr-x. 2 root root   6 Aug 21 12:29 cluster-manifests-backup
[root@aio tmp]# 

Copy/paste the following code block below to create an initial install-config.yaml.

cat << EOF > ./cluster-manifests/install-config.yaml
apiVersion: v1
baseDomain: ztp.dfw.ocp.run
compute:
- name: worker
  replicas: 0
controlPlane:
  name: master
  replicas: 1
metadata:
  name: ztp-ocpsno58
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 192.168.3.0/24
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.31.0.0/16
platform:
  none: {}
pullSecret: '$OPENSHIFT_PULL_SECRET'
sshKey: '$SSH_PUBLIC_KEY'
EOF

Did you notice the variables $OPENSHIFT_PULL_SECRET and $SSH_PUBLIC_KEY in the manifest above? Change this to match your own OpenShift pull-secret. You can obtain your OpenShift Pull-Secret from the following link HERE. Look for a section in the main window called "Tokens". There, you will see a "Copy" link next to "Pull secret". Click on that link, and make sure to put the entire output in-between the two single quotes in the manifest above (for the pullSecret key).

Do the same for the $SSH_PUBLIC_KEY. Be sure to place your public key in between the two single quotes.

Create manifest: agent-config.yaml

NOTE: You will be writing the agent-config.yaml file to the cluster-manifests directory.

Next, create an agent-config.yaml in the same way that you did above.

cat << EOF > ./cluster-manifests/agent-config.yaml
apiVersion: v1alpha1
kind: AgentConfig
metadata:
  name: ztp-ocpsno58
rendezvousIP: 192.168.3.58
hosts:
  - hostname: vm-ocp-node58
    interfaces:
     - name: enp1s0
       macAddress: 0e:00:00:03:30:58
    networkConfig:
      interfaces:
        - name: enp1s0
          type: ethernet
          state: up
          mac-address: 0e:00:00:03:30:58
          ipv4:
            enabled: true
            address:
              - ip: 192.168.3.58
                prefix-length: 24
            dhcp: false
      dns-resolver:
        config:
          server:
            - 192.168.3.5
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: 192.168.3.1
            next-hop-interface: enp1s0
            table-id: 254
EOF

Make sure to copy each of these files into the ./cluster-manifests-backup/ directory as well.

cp ./cluster-manifests/* ./cluster-manifests-backup/

The result will look something like the following:

[root@aio demo]# tree .
.
├── cluster-manifests
│   ├── agent-config.yaml
│   └── install-config.yaml
└── cluster-manifests-backup
    ├── agent-config.yaml
    └── install-config.yaml

2 directories, 4 files
[root@aio demo]#

Step 2: Generate the OpenShift Installation Media

There's really only one thing left to do, and that is to generate the installation media! You can do this with the following command.

/usr/local/bin/openshift-install agent create image --log-level debug --dir cluster-manifests

Once completed, have a look at the cluster-manifests directory. You will notice that you're initial manifests have been replaced with an agent.iso file, and an ./auth directory which includes the kubeconfig file.

[root@aio demo]# tree -a .
.
├── cluster-manifests
│   ├── agent.iso
│   ├── auth
│   │   └── kubeconfig
│   ├── .openshift_install.log
│   └── .openshift_install_state.json
└── cluster-manifests-backup
    ├── agent-config.yaml
    └── install-config.yaml

3 directories, 6 files
[root@aio demo]# 

You will also notice a couple of files such as an .openshift_install.log file, as well as a .openshift_install_state.json file. Keep the latter file safe, as it does include sensitive data such as the OpenShift pull-secret.

Step 3: Boot OpenShift Installation Media

All that is left to do now is to boot the bare metal machine or virtual machine to the agent.iso file located in the cluster-manifests directory. To demonstrate this process, I'll be using a RHEL-based QEMU environment.

Now, all my virtual machines are controlled using Redfish. I do this by running sushy-tools, which is running locally on my hypervisor. I wrote a whole blog post about doing this HERE, and you can learn more about controlling virtual machines with Redfish from another blog post I wrote HERE.

Let's assume that we already have a virtual machine created and is controlled via Redfish. This is very similar to how a bare metal host can be managed as well. So the first thing I'm going to do, like aways, is set some variables.

REDFISH_HOST="192.168.3.99"
REDFISH_PORT="8000"
REDFISH_SYSTEM="00000000-0000-0000-0030-000000000058"
REDFISH_MANAGER="00000000-0000-0000-0030-000000000058"
ISO_URL="http://192.168.3.99:8081/agent.iso"

Now let's first turn off the target system with the following command:

curl -s -d '{"ResetType":"ForceOff"}'  \
    -H "Content-Type: application/json" -X POST  \
    http://$REDFISH_HOST:$REDFISH_PORT/redfish/v1/Systems/$REDFISH_SYSTEM/Actions/ComputerSystem.Reset 

With the system off, you can verify the ISO status (if it's mounted or not):

curl -s http://$REDFISH_HOST:$REDFISH_PORT/redfish/v1/Managers/$REDFISH_MANAGER/VirtualMedia/Cd/ | jq  '[{iso_connected: .Inserted}]'

You should get back a response that looks like the following example:

[
  {
    "iso_connected": false
  }
]

Let's get our ISO mounted:

curl -d \
    '{"Image":"'"$ISO_URL"'", "Inserted": true}' \
     -H "Content-Type: application/json" \
     -X POST \
     http://$REDFISH_HOST:$REDFISH_PORT/redfish/v1/Managers/$REDFISH_MANAGER/VirtualMedia/Cd/Actions/VirtualMedia.InsertMedia 

With the ISO mounted via URL, you can now use virt-manager to boot to the ISO:

Finally, select the Play icon to start the virtual-machine:

Post-Deployment Troubleshooting

So the virtual machine is up and running, but how do we see what's going on? Do you remember the cluster-manifests/auth/ directory that was created? There's a kubeconfig file located in this directory. Let's use it now to see what's going on with the cluster.

Using openshift-install For Cluster Status

The first, and most commonly used tool for watching cluster events during installation is to leverage the agent utility once again. This can be done with a useful sub-command called wait-for. There are two options for the wait-for subcommand:

  • bootstrap-complete
  • install-complete

Agent Command: bootstrap-complete

Leverage bootstrap-complete during the bootstrap phase (prior to installation-complete). This will monitor bootstrap events, and report these back to the openshift-install agent in real time. Output should look similar to the following when all is going well:

[root@aio auth]# openshift-install agent wait-for bootstrap-complete --dir /root/deployments/openshift/agent/demo/cluster-manifests 
INFO Checking for validation failures ----------------------------------------------
ERROR Validation failure found for cluster          category=hosts-data label=all-hosts-are-ready-to-install message=The cluster has hosts that are not ready to install.
ERROR Validation failure found for vm-ocp-node58    category=network label=DNS wildcard not configured message=Parse error for domain name resolutions result
ERROR Validation failure found for vm-ocp-node58    category=network label=NTP synchronization message=Host couldn't synchronize with any NTP server
INFO Checking for validation failures ----------------------------------------------
ERROR Validation failure found for cluster          category=hosts-data label=all-hosts-are-ready-to-install message=The cluster has hosts that are not ready to install.
INFO Pre-installation validations are OK
INFO Cluster is ready for install
INFO Host vm-ocp-node58: updated status from insufficient to known (Host is ready to be installed)
INFO Preparing cluster for installation
INFO Host vm-ocp-node58: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
INFO Host vm-ocp-node58: New image status quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b6f2a69fdc1a0844565320fc51316aa79ad6d4661326b30fa606123476c3d9f7. result: success. time: 2.38 seconds; size: 378.98 Megabytes; download rate: 166.94 MBps
INFO Host vm-ocp-node58: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
INFO Cluster installation in progress
INFO Host vm-ocp-node58: updated status from preparing-successful to installing (Installation is in progress)
INFO Host: vm-ocp-node58, reached installation stage Installing: bootstrap
INFO Host: vm-ocp-node58, reached installation stage Waiting for bootkube
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 5%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 10%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 15%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 22%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 28%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 33%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 38%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 43%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 48%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 53%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 59%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 64%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 69%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 74%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 79%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 85%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 90%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 95%
INFO Host: vm-ocp-node58, reached installation stage Writing image to disk: 100%
INFO Host: vm-ocp-node58, reached installation stage Rebooting

The basic rule that I personally follow is that when the image is being written to disk, the installation will more than likely succeed, of course barring any cluster configuration issues. When the boostrap-complete command reports that the host is Rebooting, you can starting leveraging the installation-complete sub-command.

Agent Command: installation-complete

Leverage the installation-complete sub-command to monitor the cluster installation process, after the bootstrap process has completed.

Direct Host Troubleshooting

One other option that can be leveraged during the installation, as always, is direct host access monitoring. If you've noticed in our install-config.yaml manifests, an SSH public key was declared (L24 sshKey). You can use this to access the host at any point during the installation process for the cluster.

Probably one of the most interesting aspects of leveraging this method for troubleshooting is what you will find running on the RHCOS host during the initial boot:  the now very popular Assisted-Service.

[core@vm-ocp-node58 ~]$ sudo podman ps -a
CONTAINER ID  IMAGE                                                                                                                   COMMAND               CREATED         STATUS             PORTS                                                                   NAMES
1647bf195525  localhost/podman-pause:4.0.2-1655280389                                                                                                       37 seconds ago  Up 27 seconds ago  0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp  5f1b51de8531-infra
14d1df9b175d  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d8def7241e1707b99bd445a4764477cb24c180175836fae35589fedb1ad892df  /bin/bash start_d...  14 seconds ago  Up 14 seconds ago  0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp  assisted-db
725d5f0fd2bf  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d8def7241e1707b99bd445a4764477cb24c180175836fae35589fedb1ad892df  /assisted-service     14 seconds ago  Up 14 seconds ago  0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp  service
610bb4d4c571  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d8def7241e1707b99bd445a4764477cb24c180175836fae35589fedb1ad892df  /usr/local/bin/ag...  6 seconds ago   Up 6 seconds ago   0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp  apply-host-config
ec05ce2d17b8  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:51d89602f11aaae4b3e950f54274567d19bec74b86dad054e60ce56c893476b3  next_step_runner ...  3 seconds ago   Up 3 seconds ago                                                                           next-step-runner
[core@vm-ocp-node58 ~]$ 

You see, the ISO which is generated with the openshift-install binary has everything it needs to create a full cluster (in terms of bootstrapping). If you're familiar with the Assisted-Service for bare metal deployments, than you will notice this work is reused to simplify the overall deployment of OpenShift out in the field. I'll write up more information on how this is useful, and what other things this can be leveraged for in an upcoming blog post.

Accessing the Cluster

No doubt, you've already noticed the kubeconfig located in the auth/ directory, once openshift-install agent ran successfully.

But you may be asking yourself: "Where can I find the kubeadmin password?"

This is currently something being worked out by the engineering team, but for now we do have a useful workaround. Generally this method can be used to recover (really it's replace) any kubeadmin password, so-long as you have the kubeconfig file. So let's try to patch in a new kubeadmin password, so we can access the cluster.

First, export the kubeconfig that was generated from the openshift-install agent command:

export KUBECONFIG=/root/deployments/openshift/agent/demo/cluster-manifests/auth/kubeconfig

Next, run the following container utility (I'm obviously using Podman, but it doesn't matter what you use):

podman run -t quay.io/bjozsa-redhat/kubeadmin-reset:latest

The result should look like the following:

[root@aio auth]# podman run -t quay.io/bjozsa-redhat/kubeadmin-reset:latest
Actual Password: qSxz5-EngnK-oyuCe-zxePE
Hashed Password: $2a$10$6f6FiC7fYrIdD7ba08J4vOKoFQQjAZn6WfxItdbkhtnan6FkBi8P2
Data to Change in Secret: JDJhJDEwJDZmNkZpQzdmWXJJZEQ3YmEwOEo0dk9Lb0ZRUWpBWm42V2Z4SXRkYmtodG5hbjZGa0JpOFAy
[root@aio auth]#

Next, use the data provided on the line entitled "Data to Change in Secret", and patch this into the following command:

[root@aio auth]# oc patch secret -n kube-system kubeadmin --type json -p '[{"op": "replace", "path": "/data/kubeadmin", "value": "JDJhJDEwJDZmNkZpQzdmWXJJZEQ3YmEwOEo0dk9Lb0ZRUWpBWm42V2Z4SXRkYmtodG5hbjZGa0JpOFAy"}]'
secret/kubeadmin patched
[root@aio auth]# 

And now you should be able to access the WebUI with your newly created kubeadmin password!

Final Thoughts

And that's it! Seriously. With this solution, service providers aren't required to re-train RF Engineers/Technicians to be Kubernetes experts. In fact, this process is more like installing an appliance rather than a full OS with OpenShift/Kubernetes. This is the difference with where OpenShift is at in the market today: we're working with real customers, and being empathetic to their current organizational structure and workflow. Rather than asking our users to bend to what we think is best, we're adjusting our software to work best for them. It's what makes working at Red Hat, and treating our customers as legitimate parters, so rewarding.

Appendix

I don't want to leave you with a non-useful demonstration, but rather, I want to help you customize your deployments as well. Consider these follow-up articles, now that you have a gasp what we're doing.

  • OpenShift-Install Agent: Deep Dive and Technical Troubleshooting
  • OpenShift-Install Agent: Overriding Deployed Cluster Version
  • OpenShift-Install Agent: Leveraging ZTP Manifests
  • OpenShift-Install Agent: Generating First Time Kubeadmin Password
  • OpenShift-Install Agent: Compact and Multi-Node Cluster Deployments
  • OpenShift-Install Agent: Using a Private Registry

Tags