How to Upgrade EKS
Problem
EKS Clusters need to be upgraded to stay up to date with the newest features. AWS only supports the latest 3 major releases, so staying up to date is important. Upgrading may require not only updating the cluster, but the client tools, operators, API versions, addons, annotations, and AMIs.
Prerequisites
Before performing any kubernetes upgrade, make sure:
-
There are no breaking changes such as API deprecations that affect your deployed services.
-
Ensure there are no unconfirmed Spacelift runs relating to the EKS component, services running on top of EKS, or anything else EKS depends on (e.g., VPC, Global Accelerator, WAF)
-
Ensure all pods are healthy (e.g., not in a crash loop) before upgrading
A list of deprecations by version is available in the official Kubernetes https://kubernetes.io/docs/reference/using-api/deprecation-guide/.
You can use tools like pluto
by Fairwinds or kube-no-trouble
https://github.com/FairwindsOps/pluto https://github.com/doitintl/kube-no-trouble
Here’s an example of Pluto: (See docs for more details)
pluto detect-helm -owide
NAME NAMESPACE KIND VERSION REPLACEMENT DEPRECATED DEPRECATED IN REMOVED REMOVED IN
cert-manager/cert-manager-webhook cert-manager MutatingWebhookConfiguration admissionregistration.k8s.io/v1beta1 admissionregistration.k8s.io/v1 true v1.16.0 false v1.19.0
Solution
Simply update the cluster_kubernetes_version
and addons
, the apply the changes with Spacelift or Atmos.
Decide on Kubernetes Version
Kubernetes releases new minor versions (e.g. 1.21), as generally available approximately every three months. Each minor version is supported for approximately twelve months after its first release. Amazon adds support for the latest version very quickly after release, but typically after the first patch version release (e.g. 1.XX.1), but sometimes later.
https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html
Also, understand Amazon's EKS version lifecycle
Note the latest Kubernetes release may not yet be available for EKS.
https://github.com/kubernetes/kubernetes/releases
Upgrade kubectl
Version
We need to install the version of kubectl
that is at least one minor version away from the cluster_kubernetes_version
in Geodesic. Installation instructions will vary depending on if you’re using Alpine, Ubuntu, or CentOS. For the most current list of kubectl-1.x
versions, check out our cloudposse/packages
repository https://github.com/cloudposse/packages/tree/master/vendor.
We publish special packages for kubectl
to support installing multiple versions simultaneously. We support package pinning going back as far as kubectl-0.13
😵 ).
You’ll want to install the version of kubectl
by adding it to the Dockerfile
for your infrastructure repository.
The version of kubectl
CLI needs to be within one minor version of the version of Kubernetes Version you are using. You can use kubectl
v1.18 with Kubernetes 1.17, 1.18, and 1.19, but should not use it with 1.20
kubectl-1.13 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.13 | Ubuntu: apt-get install -y kubectl-1.13 Alpine: apk add kubectl-1.13@cloudposse |
kubectl-1.14 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.14 | Ubuntu: apt-get install -y kubectl-1.14 Alpine: apk add kubectl-1.14@cloudposse |
kubectl-1.15 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.15 | Ubuntu: apt-get install -y kubectl-1.15 Alpine: apk add kubectl-1.15@cloudposse |
kubectl-1.16 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.16 | Ubuntu: apt-get install -y kubectl-1.16 Alpine: apk add kubectl-1.16@cloudposse |
kubectl-1.17 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.17 | Ubuntu: apt-get install -y kubectl-1.17 Alpine: apk add kubectl-1.17@cloudposse |
kubectl-1.18 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.18 | Ubuntu: apt-get install -y kubectl-1.18 Alpine: apk add kubectl-1.18@cloudposse |
kubectl-1.19 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.19 | Ubuntu: apt-get install -y kubectl-1.19 Alpine: apk add kubectl-1.19@cloudposse |
kubectl-1.20 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.20 | Ubuntu: apt-get install -y kubectl-1.20 Alpine: apk add kubectl-1.20@cloudposse |
kubectl-1.21 | https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.21 | Ubuntu: apt-get install -y kubectl-1.21 Alpine: apk add kubectl-1.21@cloudposse |
Switch kubectl
Versions (Local Geodesic Shell)
To see Versions supported Check either the Dockerfile
, or run ls /usr/share/kubectl/
on your geodesic shell
update-alternatives --set kubectl /usr/share/kubectl/${Major.Minor}/bin/kubectl
Upgrade EKS Cluster Version
You cannot upgrade an EKS cluster more than 1 minor patch level at a time. To get from 1.17 to 1.20 requires 3 separate upgrades: 1.17 → 1.18, 1.18 → 1.19, 1.19 → 1.20, 1.20 → 1.21
-
Open a Pull Request to update the EKS Component settings.
-
Update the variable
cluster_kubernetes_version
in the stack configuration -
Update the
addons
that correspond to thecluster_kubernetes_version
(see How to Upgrade EKS Cluster Addons ) -
Update the
Dockerfile
to ensure the version ofkubectl
installed corresponds to thecluster_kubernetes_version
-
Merge the Pull Request
-
Confirm the changes in Spacelift (or
atmos terraform apply
the changes). -
Verify cluster is operational afterward by running:
-
Node Check
kubectl get nodes
-
Pod Check
kubectl get pods --all-namespaces
-
Check the AWS Console for EKS
This can take between 20-40 minutes depending on the size of the cluster!
What to do if Spacelift Runner Crashes while updating:
The spacelift worker occasionally crashes. if it does so during the long update of EKS Cluster version. Check AWS, you should find that the cluster is in the Updating
state. If it is do nothing, this will allow the cluster to finish updating. Once the Cluster is updated to your targeted version you can re-trigger the stack. It is highly likely that in the event of a crash that terraform state will be locked, follow the steps below to unlock your terraform state. After unlocking your state, re-trigger the spacelift stack and confirm the changes. this should apply successfully.
Unlocking Terraform State
To unlock the state go to your infrastructure repository, and run
atmos terraform plan eks -s <my-stack>
Then cd
into the directory of the component, such as
cd components/terraform/eks
and run
terraform force-unlock -force <LockID>
Update EKS Node Pool AMI
See https://docs.aws.amazon.com/eks/latest/userguide/update-managed-node-group.html
Override EKS Node Pool AMI Type
See https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html
Use ami_type
to set a valid value. Current supported types are AL2_x86_64 | AL2_x86_64_GPU | AL2_ARM_64
There is a pending PR for BOTTLEROCKET_ARM_64 | BOTTLEROCKET_x86_64
ami types
Override EKS Node Pool AMI Custom
Usually this shouldn’t need to be used since we support amazon’s ami_type
argument but if overriding the image with a custom AMI is desired then follow the below steps.
The upstream module is https://github.com/cloudposse/terraform-aws-eks-node-group
-
Set
ami_image_id
ineks/modules/node_groups_by_az
on thecloudposse/eks-node-group/aws
module -
Set
after_cluster_joining_userdata = ["ls"]
so the custom AMI will need to be used