Upgrading Managed Kubernetes Clusters - Azure vs AWS
At Payara we were developing a managed cloud runtime, which was running across different providers and different regions. We first started building on Azure and then added AWS to the mix. In the early days we would just work from the management console, then we migrated to house-built management scripts (because I was too lazy to embrace Terraform) but finally we moved on to managing infra with Pulumi.
Following is the excerpt from our operations guide regarding Kubernetes upgrades on the platforms, because I found myself amused by the level of sarcasm I'd put into the AWS one:
Upgrading AKS cluster
To verify available upgrades run
$ az aks get-upgrades -g resourceGroup -n clusterName --output table
Name ResourceGroup MasterVersion Upgrades
------- ------------------- --------------- --------------
default dev02-rg-westeurope 1.19.13 1.21.7, 1.21.9
|
NOTE
|
Oh boy, this was a long time ago. |
To execute the upgrade run
az aks upgrade -g resourceGroup -n clusterName --kubernetes-version 1.21.9
This is an interactive command that will ask for confirmation and will keep spinning. Also consider watching cluster events during that operation, as well as monitoring applications' downtime.
Upgrading AWS cluster
The reference documentation for updating EKS outlines many steps . Despite the Kubernetes upgrade reminder being the most prominent UI element on the EKS page in the AWS console, in true AWS fashion the process is not trivial and requires a lot of manual steps.
Let's break the process down into smaller steps:
Upgrade API Server
Upgrade control plane:
eksctl upgrade cluster --name clustername
That doesn't do anything, just verifies that it would do something.
eksctl upgrade cluster --name clustername --approve
Now we're talking!
Before upgrading nodes, verify that prefix delegation is enabled in the cluster. This is one of those things that are configured by default everywhere else, but need special attention in AWS.
You can either use the direct change as outlined in the linked document, but if you find it dirty -- and you should -- then you can do it in the console, which is no cleaner, but at least you can see what you're doing. In EKS addons find CNI plugin, edit it and in its configuration, set ENABLE_PREFIX_DELEGATION to true. You need to conform to the provided JSON schema. If you don't feel like parsing a JSON schema in your head for setting a single flag, I already parsed it for you and the JSON to put in configuration field is:
{
"env": {
"ENABLE_PREFIX_DELEGATION": "true"
}
}
Upgrade Nodes
Upgrade nodes per documentation:
eksctl upgrade nodegroup --name mng-1 --cluster clusterName --region region --force-upgrade
This failed on the last attempt because we're using Ubuntu and therefore a CUSTOM launch type.
|
NOTE
|
Our installation required AppArmor and that's why it got a little more complicated than an out-of-the-box EKS cluster. But given the magnitude of work to upgrade, it didn't add much. |
So that command is of no use. Instead, a script was created to update AMI ID in the launch template. But first, this is how to do it by hand:
$ aws eks describe-nodegroup --cluster-name clusterName --nodegroup-name mng-1 --query "nodegroup.launchTemplate.{id:id,version:version}"
{
"id": "lt-051d5efa1c6d52fea",
"version": "1"
}
# This is just FYI, so you could see what's inside it. Especially that userdata eksctl once put together.
$ aws ec2 describe-launch-template-versions --launch-template-id lt-051d5efa1c6d52fea --versions 1
With that you get basic information about your launch template. Next, you need to find the AMI ID for the new version of the nodes. They are usually named like ubuntu-eks/k8s_1.31/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-<releaseDate>. You can search for them in AWS console under "Create new Launch template", but this will not give you the ID, because that would be too bloody obvious. Go to Images > AMIs instead, where you can search for owner 099720109477 and AMI name with appropriate prefix.
Then apply it to a template and update nodegroup:
# We used powershell a lot, that's why there are backticks at end of lines
aws ec2 create-launch-template-version `
--launch-template-id lt-051d5efa1c6d52fea `
--source-version 1 `
--launch-template-data "ImageId=ami-04ba3ea84b4276ea4" `
--query "LaunchTemplateVersion.VersionNumber" `
--output text
aws eks update-nodegroup-version `
--cluster-name clusterName `
--nodegroup-name mng-1 `
--launch-template "id=lt-051d5efa1c6d52fea,version=2"
Then wait about 20 minutes until the new nodes get some workloads migrated onto them. Or they just disappear and you need to go hunting for an error message.
Upgrading Extensions
Also, you definitely need to update the autoscaler as it may work against you, and it is really upgraded by editing a deployment manually:
First, check for a release for the new Kubernetes version in Release list .
kubectl -n kube-system set image deployment.apps/cluster-autoscaler cluster-autoscaler=registry.k8s.io/autoscaling/cluster-autoscaler:v1.31.1 # or similar version
Then go and manually update the EKS addons. You will need to manually choose the appropriate version from some random list in GitHub or documentation again:
-
VPC CNI
-
Kube
-
Here the situation is more complicated; you're supposed to runAnd choose based on that
aws eks describe-addon-versions --addon-name amazon-cloudwatch-observability --kubernetes-version 1.32
The node upgrade script
Note that as of 2026, the jammy-based images are no longer available, so you'll need to tweak it a bit for your needs.
param (
[Parameter(Mandatory=$true)]
[string]$Region,
[Parameter(Mandatory=$true)]
[string]$ClusterName,
[Parameter(Mandatory=$true)]
[string]$KubernetesVersion,
[Parameter(Mandatory=$false)]
[string]$NodegroupName = "mng-1"
)
# Get the current nodegroup details to find the launch template
Write-Host "Getting current nodegroup details for $NodegroupName in cluster $ClusterName..." -ForegroundColor Cyan
$nodegroup = aws eks describe-nodegroup --cluster-name $ClusterName --nodegroup-name $NodegroupName --region $Region | ConvertFrom-Json
$launchTemplateId = $nodegroup.nodegroup.launchTemplate.id
$currentVersion = $nodegroup.nodegroup.launchTemplate.version
Write-Host "Current launch template: $launchTemplateId (version $currentVersion)" -ForegroundColor Green
# Search for the AMIs from Canonical for EKS
Write-Host "Searching for Ubuntu EKS AMIs..." -ForegroundColor Cyan
$amis = aws ec2 describe-images `
--owners 099720109477 `
--filters "Name=name,Values=ubuntu-eks/k8s_$($KubernetesVersion)/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" `
--query "Images[*].[ImageId,Name,CreationDate]" `
--output json --region $Region | ConvertFrom-Json
# Sort by creation date descending to get the newest first
$sortedAmis = $amis | Sort-Object -Property {[DateTime]$_[2]} -Descending | Select-Object -First 5
# Display the AMIs and let the user select one
Write-Host "Found $(($sortedAmis | Measure-Object).Count) AMIs. Please select one:" -ForegroundColor Yellow
for ($i=0; $i -lt $sortedAmis.Count; $i++) {
$ami = $sortedAmis[$i]
Write-Host "[$i] $($ami[0]) - $($ami[1]) - Created: $($ami[2])"
}
$selection = Read-Host "Enter the number of the AMI you want to use [0-$($sortedAmis.Count - 1)]"
$selectedAmi = $sortedAmis[$selection]
$selectedAmiId = $selectedAmi[0]
Write-Host "You selected: $selectedAmiId - $($selectedAmi[1])" -ForegroundColor Green
# Confirm before proceeding
$confirmation = Read-Host "Do you want to proceed with creating a new launch template version and upgrading the nodegroup? (y/n)"
if ($confirmation -ne 'y') {
Write-Host "Operation canceled." -ForegroundColor Red
exit
}
# Create a new launch template version
Write-Host "Creating a new launch template version..." -ForegroundColor Cyan
$newVersionResult = aws ec2 create-launch-template-version `
--launch-template-id $launchTemplateId `
--source-version $currentVersion `
--launch-template-data "ImageId=$selectedAmiId" `
--query "LaunchTemplateVersion.VersionNumber" `
--output text --region $Region
$newVersion = $newVersionResult
Write-Host "Created new launch template version: $newVersion" -ForegroundColor Green
# Update the nodegroup
Write-Host "Updating nodegroup with new launch template version..." -ForegroundColor Cyan
aws eks update-nodegroup-version `
--cluster-name $ClusterName `
--nodegroup-name $NodegroupName `
--launch-template "id=$launchTemplateId,version=$newVersion" `
--region $Region
Write-Host "Nodegroup update initiated. You can monitor the progress in the EKS console." -ForegroundColor Green
Write-Host "Run this command to check status: aws eks describe-nodegroup --cluster-name $ClusterName --nodegroup-name $NodegroupName --region $Region --query 'nodegroup.status'" -ForegroundColor Yellow
Conclusion
With great joy we migrated this process to Pulumi eks.Cluster
which would upgrade nodes automatically and we'd just add autoscaler and its required policies to that. Still ended up being a few hundred lines of TypeScript.