cluster
This component is responsible for provisioning an end-to-end EKS Cluster, including managed node groups and Fargate profiles.
Windows not supported
This component has not been tested with Windows worker nodes of any launch type. Although upstream modules support Windows nodes, there are likely issues around incorrect or insufficient IAM permissions or other configuration that would need to be resolved for this component to properly configure the upstream modules for Windows nodes. If you need Windows nodes, please experiment and be on the lookout for issues, and then report any issues to Cloud Posse.
Usage
Stack Level: Regional
Here's an example snippet for how to use this component.
This example expects the Cloud Posse Reference Architecture Identity and Network designs deployed for mapping users to EKS service roles and granting access in a private network. In addition, this example has the GitHub OIDC integration added and makes use of Karpenter to dynamically scale cluster nodes.
For more on these requirements, see Identity Reference Architecture, Network Reference Architecture, the GitHub OIDC component, and the Karpenter component.
Mixin pattern for Kubernetes version
We recommend separating out the Kubernetes and related addons versions into a separate mixin (one per Kubernetes minor version), to make it easier to run different versions in different environments, for example while testing a new version.
We also recommend leaving "resolve conflicts" settings unset and therefore using the default "OVERWRITE" setting because any custom configuration that you would want to preserve should be managed by Terraform configuring the add-ons directly.
For example, create catalog/eks/cluster/mixins/k8s-1-29.yaml with the following content:
components:
terraform:
eks/cluster:
vars:
cluster_kubernetes_version: "1.29"
# You can set all the add-on versions to `null` to use the latest version,
# but that introduces drift as new versions are released. As usual, we recommend
# pinning the versions to a specific version and upgrading when convenient.
# Determine the latest version of the EKS add-ons for the specified Kubernetes version
# EKS_K8S_VERSION=1.29 # replace with your cluster version
# ADD_ON=vpc-cni # replace with the add-on name
# echo "${ADD_ON}:" && aws eks describe-addon-versions --kubernetes-version $EKS_K8S_VERSION --addon-name $ADD_ON \
# --query 'addons[].addonVersions[].{Version: addonVersion, Defaultversion: compatibilities[0].defaultVersion}' --output table
# To see versions for all the add-ons, wrap the above command in a for loop:
# for ADD_ON in vpc-cni kube-proxy coredns aws-ebs-csi-driver aws-efs-csi-driver; do
# echo "${ADD_ON}:" && aws eks describe-addon-versions --kubernetes-version $EKS_K8S_VERSION --addon-name $ADD_ON \
# --query 'addons[].addonVersions[].{Version: addonVersion, Defaultversion: compatibilities[0].defaultVersion}' --output table
# done
# To see the custom configuration schema for an add-on, run the following command:
# aws eks describe-addon-configuration --addon-name aws-ebs-csi-driver \
# --addon-version v1.20.0-eksbuild.1 | jq '.configurationSchema | fromjson'
# See the `coredns` configuration below for an example of how to set a custom configuration.
# https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html
# https://docs.aws.amazon.com/eks/latest/userguide/managing-add-ons.html#creating-an-add-on
addons:
# https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html
# https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html
# https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html#cni-iam-role-create-role
# https://aws.github.io/aws-eks-best-practices/networking/vpc-cni/#deploy-vpc-cni-managed-add-on
vpc-cni:
addon_version: "v1.16.0-eksbuild.1" # set `addon_version` to `null` to use the latest version
# https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html
kube-proxy:
addon_version: "v1.29.0-eksbuild.1" # set `addon_version` to `null` to use the latest version
# https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html
coredns:
addon_version: "v1.11.1-eksbuild.4" # set `addon_version` to `null` to use the latest version
## override default replica count of 2. In very large clusters, you may want to increase this.
configuration_values: '{"replicaCount": 3}'
# https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html
# https://aws.amazon.com/blogs/containers/amazon-ebs-csi-driver-is-now-generally-available-in-amazon-eks-add-ons
# https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html#csi-iam-role
# https://github.com/kubernetes-sigs/aws-ebs-csi-driver
aws-ebs-csi-driver:
addon_version: "v1.27.0-eksbuild.1" # set `addon_version` to `null` to use the latest version
# If you are not using [volume snapshots](https://kubernetes.io/blog/2020/12/10/kubernetes-1.20-volume-snapshot-moves-to-ga/#how-to-use-volume-snapshots)
# (and you probably are not), disable the EBS Snapshotter
# See https://github.com/aws/containers-roadmap/issues/1919
configuration_values: '{"sidecars":{"snapshotter":{"forceEnable":false}}}'
aws-efs-csi-driver:
addon_version: "v1.7.7-eksbuild.1" # set `addon_version` to `null` to use the latest version
# Set a short timeout in case of conflict with an existing efs-controller deployment
create_timeout: "7m"
Common settings for all Kubernetes versions
In your main stack configuration, you can then set the Kubernetes version by importing the appropriate mixin:
#
import:
- catalog/eks/cluster/mixins/k8s-1-29
components:
terraform:
eks/cluster:
vars:
enabled: true
name: eks
vpc_component_name: "vpc"
eks_component_name: "eks/cluster"
# Your choice of availability zones or availability zone ids
# availability_zones: ["us-east-1a", "us-east-1b", "us-east-1c"]
aws_ssm_agent_enabled: true
allow_ingress_from_vpc_accounts:
- tenant: core
stage: auto
- tenant: core
stage: corp
- tenant: core
stage: network
public_access_cidrs: []
allowed_cidr_blocks: []
allowed_security_groups: []
enabled_cluster_log_types:
# Caution: enabling `api` log events may lead to a substantial increase in Cloudwatch Logs expenses.
- api
- audit
- authenticator
- controllerManager
- scheduler
oidc_provider_enabled: true
# Allows GitHub OIDC role
github_actions_iam_role_enabled: true
github_actions_iam_role_attributes: ["eks"]
github_actions_allowed_repos:
- acme/infra
# We recommend, at a minimum, deploying 1 managed node group,
# with the same number of instances as availability zones (typically 3).
managed_node_groups_enabled: true
node_groups: # for most attributes, setting null here means use setting from node_group_defaults
main:
# availability_zones = null will create one autoscaling group
# in every private subnet in the VPC
availability_zones: null
# Tune the desired and minimum group size according to your baseload requirements.
# We recommend no autoscaling for the main node group, so it will
# stay at the specified desired group size, with additional
# capacity provided by Karpenter. Nevertheless, we recommend
# deploying enough capacity in the node group to handle your
# baseload requirements, and in production, we recommend you
# have a large enough node group to handle 3/2 (1.5) times your
# baseload requirements, to handle the loss of a single AZ.
desired_group_size: 3 # number of instances to start with, should be >= number of AZs
min_group_size: 3 # must be >= number of AZs
max_group_size: 3
# Can only set one of ami_release_version or kubernetes_version
# Leave both null to use latest AMI for Cluster Kubernetes version
kubernetes_version: null # use cluster Kubernetes version
ami_release_version: null # use latest AMI for Kubernetes version
attributes: []
create_before_destroy: true
cluster_autoscaler_enabled: true
instance_types:
# Tune the instance type according to your baseload requirements.
- c7a.medium
ami_type: AL2_x86_64 # use "AL2_x86_64" for standard instances, "AL2_x86_64_GPU" for GPU instances
node_userdata:
# WARNING: node_userdata is alpha status and will likely change in the future.
# Also, it is only supported for AL2 and some Windows AMIs, not BottleRocket or AL2023.
# Kubernetes docs: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/
kubelet_extra_args: >-
--kube-reserved cpu=100m,memory=0.6Gi,ephemeral-storage=1Gi --system-reserved
cpu=100m,memory=0.2Gi,ephemeral-storage=1Gi --eviction-hard
memory.available<200Mi,nodefs.available<10%,imagefs.available<15%
block_device_map:
# EBS volume for local ephemeral storage
# IGNORED if legacy `disk_encryption_enabled` or `disk_size` are set!
# Use "/dev/xvda" for most of the instances (without local NVMe)
# using most of the Linuxes, "/dev/xvdb" for BottleRocket
"/dev/xvda":
ebs:
volume_size: 100 # number of GB
volume_type: gp3
kubernetes_labels: {}
kubernetes_taints: {}
resources_to_tag:
- instance
- volume
tags: null
# The abbreviation method used for Availability Zones in your project.
# Used for naming resources in managed node groups.
# Either "short" or "fixed".
availability_zone_abbreviation_type: fixed
cluster_private_subnets_only: true
cluster_encryption_config_enabled: true
cluster_endpoint_private_access: true
cluster_endpoint_public_access: false
cluster_log_retention_period: 90
# List of `aws-team-roles` (in the account where the EKS cluster is deployed) to map to Kubernetes RBAC groups
# You cannot set `system:*` groups here, except for `system:masters`.
# The `idp:*` roles referenced here are created by the `eks/idp-roles` component.
# While set here, the `idp:*` roles will have no effect until after
# the `eks/idp-roles` component is applied, which must be after the
# `eks/cluster` component is deployed.
aws_team_roles_rbac:
- aws_team_role: admin
groups:
- system:masters
- aws_team_role: poweruser
groups:
- idp:poweruser
- aws_team_role: observer
groups:
- idp:observer
- aws_team_role: planner
groups:
- idp:observer
- aws_team_role: terraform
groups:
- system:masters
# Permission sets from AWS SSO allowing cluster access
# See `aws-sso` component.
aws_sso_permission_sets_rbac:
- aws_sso_permission_set: PowerUserAccess
groups:
- idp:poweruser
# Set to false if you are not using Karpenter
karpenter_iam_role_enabled: true
# All Fargate Profiles will use the same IAM Role when `legacy_fargate_1_role_per_profile_enabled` is set to false.
# Recommended for all new clusters, but will damage existing clusters provisioned with the legacy component.
legacy_fargate_1_role_per_profile_enabled: false
# While it is possible to deploy add-ons to Fargate Profiles, it is not recommended. Use a managed node group instead.
deploy_addons_to_fargate: false
EKS Auto Mode
EKS Auto Mode (GA December 2024) delegates compute, networking, and storage management to AWS. When enabled, AWS manages:
- Compute -- Node provisioning via managed Karpenter (replaces self-managed Karpenter and managed node groups)
- Networking -- Elastic load balancing for Services and Ingress (replaces self-managed ALB controller)
- Storage -- EBS block storage via
ebs.csi.eks.amazonaws.comprovisioner (replaces self-managed EBS CSI driver)
Auto Mode also manages the vpc-cni, kube-proxy, coredns, and aws-ebs-csi-driver add-ons automatically.
Greenfield (new cluster) with Auto Mode
Create a mixin at catalog/eks/cluster/mixins/auto-mode.yaml:
components:
terraform:
eks/cluster:
vars:
# Enable EKS Auto Mode
auto_mode_enabled: true
auto_mode_node_pools:
- general-purpose
- system
# Disable self-managed compute -- Auto Mode handles it
managed_node_groups_enabled: false
node_groups: {}
karpenter_iam_role_enabled: false
fargate_profiles: {}
# Remove addons managed by Auto Mode.
# Only aws-efs-csi-driver remains (not managed by Auto Mode).
addons:
aws-efs-csi-driver:
addon_version: null # use latest compatible
addons_depends_on: false
Import it in your stack configuration after your Kubernetes version mixin:
import:
- catalog/eks/cluster/mixins/k8s-1-31
- catalog/eks/cluster/mixins/auto-mode
When using Auto Mode, do not deploy the following components:
eks/karpenter-- Auto Mode includes managed Karpentereks/alb-controller-- Auto Mode includes elastic load balancingeks/karpenter-node-pool-- Use Auto Mode's built-ingeneral-purposeandsystemnode pools (custom NodePools can still be created, but use theeks.amazonaws.com/v1NodeClass API)
Cluster version upgrades with Auto Mode
Auto Mode significantly simplifies Kubernetes version upgrades:
- Bump
cluster_kubernetes_versionin your version mixin (e.g.,"1.31"to"1.32") - Apply -- the control plane upgrades in place (~10-15 minutes)
- Nodes auto-update -- Auto Mode's managed Karpenter detects version drift and automatically replaces nodes with the new kubelet version. No manual intervention required.
- Add-ons auto-update --
vpc-cni,kube-proxy,coredns, andaws-ebs-csi-driverare automatically upgraded to compatible versions as part of the control plane upgrade. - Non-Auto-Mode add-ons (e.g.,
aws-efs-csi-driver) still need manual version bumps.
Ensure workloads have PodDisruptionBudgets for graceful node replacement during the rolling update.
For brownfield migration from an existing cluster, see UPGRADING.md.
Important Auto Mode limitations
- Bottlerocket only -- Auto Mode nodes run Bottlerocket. Custom AMIs (AL2, Ubuntu, Windows) are not supported.
- No SSH/IMDS -- Auto Mode nodes are immutable with no SSH or instance metadata access.
- 21-day max node lifetime -- Nodes are automatically replaced within 21 days. Workloads must tolerate rotation.
- EBS provisioner change -- Auto Mode uses
ebs.csi.eks.amazonaws.com(notebs.csi.aws.com). PVCs created by one provisioner cannot mount on nodes managed by the other. - CoreDNS runs as a systemd service -- Custom CoreDNS ConfigMaps will not work on Auto Mode nodes.
EKS Capabilities
EKS Capabilities allow you to enable AWS-managed operational software directly on your cluster. Supported capability types:
- ARGOCD -- Managed Argo CD for GitOps continuous delivery
- ACK -- AWS Controllers for Kubernetes, enabling management of AWS resources via Kubernetes custom resources
- KRO -- Kube Resource Orchestrator for composing Kubernetes resources
Each capability requires an IAM role. You can either provide your own via role_arn or let the component
create one automatically. Auto-created roles use a trust policy for capabilities.eks.amazonaws.com and
can have additional IAM policies attached via iam_policy_arns.
Argo CD capability
The managed Argo CD capability requires aws-sso
(IAM Identity Center) v3.0.2 or later for RBAC role mapping and the group_map output used in the
Atmos !terraform.state example below.
components:
terraform:
eks/cluster:
vars:
capabilities:
argocd:
type: ARGOCD
configuration:
argo_cd:
namespace: argocd
aws_idc:
idc_instance_arn: "arn:aws:sso:::instance/ssoins-1234567890abcdef"
# idc_region: null # defaults to cluster region
rbac_role_mapping:
- role: ADMIN
identity:
- id: "12345678-1234-1234-1234-123456789012" # SSO group ID
type: SSO_GROUP
- role: VIEWER
identity:
- id: "87654321-4321-4321-4321-210987654321"
type: SSO_GROUP
When using Atmos, you can use !terraform.state lookups to resolve the SSO
instance ARN and group IDs dynamically from the aws-sso component outputs instead of hardcoding them:
components:
terraform:
eks/cluster:
vars:
capabilities:
argocd:
type: ARGOCD
configuration:
argo_cd:
namespace: argocd
aws_idc:
idc_instance_arn: !terraform.state aws-sso core-gbl-root ssoadmin_instance_arn
rbac_role_mapping:
- role: ADMIN
identity:
- id: !terraform.state aws-sso core-gbl-root group_map["Managers"]
type: SSO_GROUP
ACK capability
components:
terraform:
eks/cluster:
vars:
capabilities:
ack-s3:
type: ACK
iam_policy_arns:
- "arn:aws:iam::aws:policy/AmazonS3FullAccess"
KRO capability
components:
terraform:
eks/cluster:
vars:
capabilities:
kro:
type: KRO
Combining multiple capabilities
Multiple capabilities can be enabled simultaneously:
components:
terraform:
eks/cluster:
vars:
capabilities:
argocd:
type: ARGOCD
configuration:
argo_cd:
namespace: argocd
aws_idc:
idc_instance_arn: "arn:aws:sso:::instance/ssoins-1234567890abcdef"
rbac_role_mapping:
- role: ADMIN
identity:
- id: "12345678-1234-1234-1234-123456789012"
type: SSO_GROUP
ack-s3:
type: ACK
iam_policy_arns:
- "arn:aws:iam::aws:policy/AmazonS3FullAccess"
kro:
type: KRO
The component outputs capabilities (map of enabled capabilities with ARNs and types) and
capability_role_arns (map of auto-created IAM role ARNs) for use by downstream components.