Skip to main content

How to Upgrade EKS

Problem

EKS Clusters need to be upgraded to stay up to date with the newest features. AWS only supports the latest 3 major releases, so staying up to date is important. Upgrading may require not only updating the cluster, but the client tools, operators, API versions, addons, annotations, and AMIs.

Prerequisites

caution

Before performing any kubernetes upgrade, make sure:

  • There are no breaking changes such as API deprecations that affect your deployed services.

  • Ensure there are no unconfirmed Spacelift runs relating to the EKS component, services running on top of EKS, or anything else EKS depends on (e.g., VPC, Global Accelerator, WAF)

  • Ensure all pods are healthy (e.g., not in a crash loop) before upgrading

A list of deprecations by version is available in the official Kubernetes https://kubernetes.io/docs/reference/using-api/deprecation-guide/.

You can use tools like pluto by Fairwinds or kube-no-trouble

https://github.com/FairwindsOps/pluto https://github.com/doitintl/kube-no-trouble

Here’s an example of Pluto: (See docs for more details)

pluto detect-helm -owide
NAME NAMESPACE KIND VERSION REPLACEMENT DEPRECATED DEPRECATED IN REMOVED REMOVED IN
cert-manager/cert-manager-webhook cert-manager MutatingWebhookConfiguration admissionregistration.k8s.io/v1beta1 admissionregistration.k8s.io/v1 true v1.16.0 false v1.19.0

Solution

tip

Simply update the cluster_kubernetes_version and addons, the apply the changes with Spacelift or Atmos.

Decide on Kubernetes Version

Kubernetes releases new minor versions (e.g. 1.21), as generally available approximately every three months. Each minor version is supported for approximately twelve months after its first release. Amazon adds support for the latest version very quickly after release, but typically after the first patch version release (e.g. 1.XX.1), but sometimes later.

https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html

Also, understand Amazon's EKS version lifecycle

caution

Note the latest Kubernetes release may not yet be available for EKS.

https://github.com/kubernetes/kubernetes/releases

Upgrade kubectl Version

We need to install the version of kubectl that is at least one minor version away from the cluster_kubernetes_version in Geodesic. Installation instructions will vary depending on if you’re using Alpine, Ubuntu, or CentOS. For the most current list of kubectl-1.x versions, check out our cloudposse/packages repository https://github.com/cloudposse/packages/tree/master/vendor.

We publish special packages for kubectl to support installing multiple versions simultaneously. We support package pinning going back as far as kubectl-0.13 😵 ).

You’ll want to install the version of kubectl by adding it to the Dockerfile for your infrastructure repository.

caution

The version of kubectl CLI needs to be within one minor version of the version of Kubernetes Version you are using. You can use kubectl v1.18 with Kubernetes 1.17, 1.18, and 1.19, but should not use it with 1.20

kubectl-1.13https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.13Ubuntu: apt-get install -y kubectl-1.13
Alpine: apk add kubectl-1.13@cloudposse
kubectl-1.14https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.14Ubuntu: apt-get install -y kubectl-1.14
Alpine: apk add kubectl-1.14@cloudposse
kubectl-1.15https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.15Ubuntu: apt-get install -y kubectl-1.15
Alpine: apk add kubectl-1.15@cloudposse
kubectl-1.16https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.16Ubuntu: apt-get install -y kubectl-1.16
Alpine: apk add kubectl-1.16@cloudposse
kubectl-1.17https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.17Ubuntu: apt-get install -y kubectl-1.17
Alpine: apk add kubectl-1.17@cloudposse
kubectl-1.18https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.18Ubuntu: apt-get install -y kubectl-1.18
Alpine: apk add kubectl-1.18@cloudposse
kubectl-1.19https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.19Ubuntu: apt-get install -y kubectl-1.19
Alpine: apk add kubectl-1.19@cloudposse
kubectl-1.20https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.20Ubuntu: apt-get install -y kubectl-1.20
Alpine: apk add kubectl-1.20@cloudposse
kubectl-1.21https://github.com/cloudposse/packages/tree/master/vendor/kubectl-1.21Ubuntu: apt-get install -y kubectl-1.21
Alpine: apk add kubectl-1.21@cloudposse

Switch kubectl Versions (Local Geodesic Shell)

To see Versions supported Check either the Dockerfile, or run ls /usr/share/kubectl/ on your geodesic shell

update-alternatives --set kubectl /usr/share/kubectl/${Major.Minor}/bin/kubectl

Upgrade EKS Cluster Version

caution

You cannot upgrade an EKS cluster more than 1 minor patch level at a time. To get from 1.17 to 1.20 requires 3 separate upgrades: 1.17 → 1.18, 1.18 → 1.19, 1.19 → 1.20, 1.20 → 1.21

  1. Open a Pull Request to update the EKS Component settings.

  2. Update the variable cluster_kubernetes_version in the stack configuration

  3. Update the addons that correspond to the cluster_kubernetes_version (see How to Upgrade EKS Cluster Addons )

  4. Update the Dockerfile to ensure the version of kubectl installed corresponds to the cluster_kubernetes_version

  5. Merge the Pull Request

  6. Confirm the changes in Spacelift (or atmos terraform apply the changes).

  7. Verify cluster is operational afterward by running:

  8. Node Check kubectl get nodes

  9. Pod Check kubectl get pods --all-namespaces

  10. Check the AWS Console for EKS

info

This can take between 20-40 minutes depending on the size of the cluster!

What to do if Spacelift Runner Crashes while updating:

The spacelift worker occasionally crashes. if it does so during the long update of EKS Cluster version. Check AWS, you should find that the cluster is in the Updating state. If it is do nothing, this will allow the cluster to finish updating. Once the Cluster is updated to your targeted version you can re-trigger the stack. It is highly likely that in the event of a crash that terraform state will be locked, follow the steps below to unlock your terraform state. After unlocking your state, re-trigger the spacelift stack and confirm the changes. this should apply successfully.

Unlocking Terraform State

To unlock the state go to your infrastructure repository, and run

atmos terraform plan eks -s <my-stack>

Then cd into the directory of the component, such as

cd components/terraform/eks

and run

terraform force-unlock -force <LockID>

Update EKS Node Pool AMI

See https://docs.aws.amazon.com/eks/latest/userguide/update-managed-node-group.html

Override EKS Node Pool AMI Type

See https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html

Use ami_type to set a valid value. Current supported types are AL2_x86_64 | AL2_x86_64_GPU | AL2_ARM_64

There is a pending PR for BOTTLEROCKET_ARM_64 | BOTTLEROCKET_x86_64 ami types

Override EKS Node Pool AMI Custom

Usually this shouldn’t need to be used since we support amazon’s ami_type argument but if overriding the image with a custom AMI is desired then follow the below steps.

The upstream module is https://github.com/cloudposse/terraform-aws-eks-node-group

  1. Set ami_image_id in eks/modules/node_groups_by_az on the cloudposse/eks-node-group/aws module

  2. Set after_cluster_joining_userdata = ["ls"] so the custom AMI will need to be used

References