Skip to main content

FAQ

Frequently asked questions about EKS with Cloud Posse's reference architecture.

How can I create secrets for an EKS cluster?

Consider deploying the external-secrets-operator component.

This component creates an external SecretStore configured to synchronize secrets from AWS SSM Parameter store as Kubernetes Secrets within the cluster. Per the operator pattern, the external-secret-operator pods will watch for any ExternalSecret resources which reference the SecretStore to pull secrets from.

How does the alb-controller-ingress-group determine the name of the ALB?

  1. First the component uses the null-label module to generate our intended name. We do this to meet the character length restrictions on ALB names. ref
  2. Then we pass that output to the Kubernetes Ingress resource with an annotation intended to define the ALB's name. ref
  3. Now the Ingress is created and alb-controller creates an ALB using the annotations on that Ingress. This ALB name will have a dynamic character sequence at the end of it, so we cannot know what the name will be ahead of time.
  4. Finally, we grab the actual name that is given to the created ALB with the data.aws_lb resources. ref
  5. Then output that name for future reference. ref

How can we create Self-Hosted Runners for GitHub with EKS?

Self-Hosted Runners are a great way to save cost and add customizations with GitHub Actions. Since we've already implemented EKS for our platform, we can build off that foundation to create another cluster to manage Self-Hosted runners in GitHub. We deploy that new EKS cluster to core-auto and install the Actions Runner Controller (ARC) chart. This controller will launch and scale runners for GitHub automatically.

For more on how to set up ARC, see the GitHub Action Runners setup docs for EKS.

Common Connectivity Issues and Solutions

If you're having trouble connecting to your EKS cluster, follow these comprehensive steps to diagnose and resolve the issue:

1. Test Basic Connectivity

First, test basic connectivity to your cluster endpoint. This helps isolate whether the issue is with basic network connectivity or something more specific:

curl -fsSk --max-time 5 "https://CLUSTER_ENDPOINT/healthz"

If these tests fail, it indicates a fundamental connectivity issue that needs to be addressed before proceeding to more specific troubleshooting.

2. Check Node Communication

If worker nodes aren't joining the cluster, follow these detailed steps:

  • Verify that the addon stack file (e.g., stacks/catalog/eks/mixins/k8s-1-29.yaml) is imported into your stack.
  • Verify cluster add-ons are properly configured for your EKS version.
    • Check CoreDNS is running
    • Verify kube-proxy is deployed
    • Ensure VPC CNI is correctly configured
  • Confirm the rendered component stack configuration.
atmos describe component eks/cluster -s <stack>

3. Verify Network Configuration

  • Security Groups:
    • Control plane security group must allow port 443 inbound from worker nodes
    • Worker node security group must allow all traffic between nodes
    • Verify outbound internet access for pulling container images
  • Subnet Routes:
    • Verify route tables have paths to all required destinations
    • Check for conflicting or overlapping CIDR ranges
    • Ensure NAT Gateway is properly configured for private subnets
  • Transit Gateway:
    • Verify TGW attachments are active and associated
    • Check TGW route tables for correct propagation
    • Confirm cross-account routing if applicable
  • Private Subnets Configuration:
    • Set cluster_private_subnets_only: true in your configuration
    • Ensure private subnets have proper NAT Gateway routing

4. VPN Connectivity

When accessing via AWS Client VPN, verify these configurations:

  • VPN Routes:
    • Check route table entries for EKS VPC CIDR
    • Verify routes are active and not in pending state
    • Confirm no conflicting routes exist
  • Subnet Associations:
    • Ensure VPN endpoint is associated with correct subnets
    • Verify subnet route tables include VPN CIDR range
  • Authorization Rules:
    • Check network ACLs allow VPN CIDR range
    • Verify security group rules permit VPN traffic
    • Confirm IAM roles have necessary permissions

After making any changes, have clients disconnect and reconnect to receive updated routes.

5. Advanced Diagnostics

  • AWS Reachability Analyzer:
    • Enable cross-account analysis for VPC peering or TGW connections
    • Test from VPN ENI to cluster endpoint
    • Test return path from cluster to VPN ENI