kloia Blog

How to Enable Live Migration in KubeVirt with AWS FSx and OpenShift

Written by Bilal Unal | Oct 17, 2024 6:35:03 AM

In today's computing landscape, ensuring live migration of virtual machines (VMs) is essential for maintaining high availability and minimizing downtime during maintenance tasks. KubeVirt, an extension of Kubernetes, integrates VM management into the containerized world, enabling unified control over both virtualized and containerized workloads.

In this guide, we’ll walk you through setting up a scalable infrastructure on AWS that supports the Live Migration feature in KubeVirt, utilizing AWS FSx for NetApp ONTAP and OpenShift Container Platform (OCP). By the end of this tutorial, you’ll have the tools to live migrate VMs effortlessly within your Kubernetes cluster, ensuring high availability and reliability in your cloud environment.

Tech Stack Overview

The following technologies are crucial to this setup:

  • AWS Bare Metal EC2 Instances
    Provide the physical hardware required for running KubeVirt, as KubeVirt requires deployment on metal instances.
  • AWS FSx for NetApp ONTAP
    Offers a fully managed shared file system with ReadWriteMany access, essential for live migration.
  • OpenShift Container Platform (OCP)
    A Kubernetes-based container orchestration platform that simplifies application deployment and management.
  • KubeVirt
    Extends Kubernetes by allowing it to manage virtual machines as native Kubernetes resources.
  • Trident CSI Driver
    A Container Storage Interface (CSI) driver from NetApp that integrates with Kubernetes to manage storage provisioning.

Prerequisites

Before proceeding, ensure you have:

  • Basic knowledge of Kubernetes, OpenShift, and AWS services.
  • A clone of the project repository, which contains all the necessary configuration files and templates:
    git clone https://github.com/kloia/aws-ocp-kubevirt-fsx.git
    cd aws-ocp-kubevirt-fsx
  • An AWS account with the necessary permissions to create EC2 instances, FSx file systems, and VPC configurations.
  • The OpenShift installer and kubectl command-line tools installed on your workstation.

Deployment

  • Setting Up the Environment
  • Installation
  • Deploying the Trident CSI Driver
  • Deploying KubeVirt

Live Migration of VMs with KubeVirtSetting Up the Environment

Download the OpenShift Installer

First, download the OpenShift installer for your platform:


curl -fsSLO https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.14.4/openshift-install-mac-arm64-4.14.4.tar.gz
tar xzf openshift-install-mac-arm64-4.14.4.tar.gz
./openshift-install --help

Create the Installation Configuration

Create a directory for your OpenShift manifests:


mkdir -p ocp-manifests-dir/

First, download the OpenShift installer for your platform:

Save the following install-config.yaml file inside ocp-manifests-dir/: 

apiVersion: v1
baseDomain: yourdomain.com
compute:
- name: worker
  platform:
    aws:
      type: c5.metal
  replicas: 2
controlPlane:
  name: master
  platform: {}
  replicas: 3
metadata:
  name: ocp-demo
networking:
  networkType: OVNKubernetes
platform:
  aws:
    region: your-aws-region
publish: External
pullSecret: 'your-pull-secret'
sshKey: 'your-ssh-key'

Note: Replace placeholders like yourdomain.com, your-aws-region, your-pull-secret, and your-ssh-key with your actual values.

Generate Manifests

Generate the OpenShift manifests:


./openshift-install create manifests --dir ocp-manifests-dir

Installation

Install the OpenShift Cluster

Backup your installation configuration:


cp -r ocp-manifests-dir/ ocp-manifests-dir-bkp

 

Start the cluster installation:


./openshift-install create cluster --dir ocp-manifests-dir --log-level debug

Provision AWS FSx for NetApp ONTAP

We need a multi-AZ file system to support ReadWriteMany access. Navigate to the fsx directory and create the FSx file system using AWS CloudFormation:


cd fsx

aws cloudformation create-stack \
  --stack-name FSXONTAP \
  --template-body file://./netapp-cf-template.yaml \
  --region your-aws-region \
  --parameters \
    ParameterKey=Subnet1ID,ParameterValue=subnet-xxxxxxxx \
    ParameterKey=Subnet2ID,ParameterValue=subnet-yyyyyyyy \
    ParameterKey=myVpc,ParameterValue=vpc-zzzzzzzz \
    ParameterKey=FSxONTAPRouteTable,ParameterValue=rtb-aaaaaaa,rtb-bbbbbbb \
    ParameterKey=FileSystemName,ParameterValue=myFSxONTAP \
    ParameterKey=ThroughputCapacity,ParameterValue=256 \
    ParameterKey=FSxAllowedCIDR,ParameterValue=0.0.0.0/0 \
    ParameterKey=FsxAdminPassword,ParameterValue=YourFSxAdminPassword \
    ParameterKey=SvmAdminPassword,ParameterValue=YourSvmAdminPassword \
  --capabilities CAPABILITY_NAMED_IAM

Note: Replace the parameter values with your actual AWS resource IDs and desired passwords.

Deploying the Trident CSI Driver

Set KUBECONFIG Environment Variable


export KUBECONFIG=$(pwd)/ocp-manifests-dir/auth/kubeconfig
kubectl get nodes

Install Trident Operator

Create the trident namespace and install the Trident CSI driver:


oc create ns trident
curl -L -o trident-installer.tar.gz https://github.com/NetApp/trident/releases/download/v22.10.0/trident-installer-22.10.0.tar.gz
tar -xvf trident-installer.tar.gz
cd trident-installer/helm
helm install trident -n trident trident-operator-22.10.0.tgz

Create Secrets for Backend Access

Create a svm_secret.yaml file with the following content:


apiVersion: v1
kind: Secret
metadata:
  name: backend-fsx-ontap-nas-secret
  namespace: trident
type: Opaque
stringData:
  username: vsadmin
  password: YourSvmAdminPassword

Apply the secret:


oc apply -f svm_secret.yaml

Deploy the Trident Backend Configuration

Edit backend-ontap-nas.yaml in the fsx directory, replacing placeholders with your FSx for ONTAP details:


version: 1
storageDriverName: ontap-nas
managementLIF: management-dns-name
dataLIF: nfs-dns-name
svm: svm-name
username: vsadmin
password: YourSvmAdminPassword

Apply the backend configuration:


oc apply -f fsx/backend-ontap-nas.yaml

 

Verify the backend status:


oc get tridentbackends -n trident


Create a Storage Class

Create a storage class by applying storage-class-csi-nas.yaml


oc apply -f fsx/storage-class-csi-nas.yaml

 

Verify the storage class:


oc get sc

Deploying KubeVirt

Install KubeVirt in the openshift-cnv namespace:


echo '
apiVersion: v1
kind: Namespace
metadata:
 name: openshift-cnv
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
 name: kubevirt-hyperconverged-group
 namespace: openshift-cnv
spec:
 targetNamespaces:
   - openshift-cnv
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
 name: hco-operatorhub
 namespace: openshift-cnv
spec:
 source: redhat-operators
 sourceNamespace: openshift-marketplace
 name: kubevirt-hyperconverged
 startingCSV: kubevirt-hyperconverged-operator.v4.14.0
 channel: "stable"' | k apply -f-

Wait for all pods in openshift-cnv to be ready, then create the HyperConverged resource:


apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
 name: kubevirt-hyperconverged
 namespace: openshift-cnv
spec:' | k apply -f- 

Verify the installation:


oc get csv -n openshift-cnv
oc get kubevirt -n openshift-cnv
oc get HyperConverged -n openshift-cnv

Live Migration of VMs with KubeVirt

Scenario

You have two VMs running on separate bare-metal worker nodes in your OpenShift cluster. You need to perform maintenance on WorkerA and want to live-migrate its VM to WorkerB without downtime.

Challenge

KubeVirt VMs are essentially Kubernetes pods. When a pod moves to a different node, it gets a new IP address, disrupting connectivity.

Solution

To maintain continuous network connectivity during migration, we'll add a second network interface to the VMs using a NetworkAttachmentDefinition (NAD). This secondary interface will have a static IP, ensuring seamless communication post-migration.

Create NetworkAttachmentDefinition (NAD)

Create a namespace for your VMs


oc create ns vm-test

 

Apply the NAD configuration:


apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: static-eth1
  namespace: vm-test
spec:
  config: '{
    "cniVersion": "0.3.1",
    "type": "bridge",
    "bridge": "br1",
    "ipam": {
      "type": "static"
    }
  }'

Apply the NAD:


oc apply -f virtualization/nad.yaml

Create VMs with Dual NICs

Create two VMs, each with two network interfaces:


oc apply -f virtualization/vm-rhel-9-dual-nic.yaml

Verify the VMs are running:


oc get vm -n vm-test

Assign IP Addresses to Secondary NICs

Access each VM console and assign static IPs to eth1:

VM A:


virtctl console -n vm-test rhel9-dual-nic-a

Inside the VM:


sudo ip addr add 192.168.1.10/24 dev eth1

VM B:


virtctl console -n vm-test rhel9-dual-nic-b

Inside the VM:


sudo ip addr add 192.168.1.11/24 dev eth1

Connectivity Test

From VM A, ping VM B:


ping 192.168.1.11

Connectivity Test

From VM B, ping VM A:


ping 192.168.1.10

Successful replies confirm network connectivity over the secondary interfaces.

Live Migration

Now, initiate live migration of VM A to WorkerB:


oc migrate vm rhel9-dual-nic-a -n vm-test


Monitor the migration status:


oc get vmim -n vm-test

In conclusion, by integrating AWS FSx for NetApp ONTAP with OpenShift and KubeVirt, we've successfully enabled live migration of virtual machines (VMs) within a Kubernetes cluster. Utilizing a secondary network interface with a static IP ensured continuous network connectivity during migrations, allowing for seamless maintenance and scaling operations without disrupting running applications.

This robust setup harnesses the power of AWS managed services and open-source technologies to deliver a scalable, resilient infrastructure ideal for modern cloud-native workloads, ensuring high availability and operational efficiency.