We've implemented the llms.txt standard to make our documentation more accessible to AI assistants, ensuring better responses when you ask ChatGPT, Claude, or other LLMs about Cloud Posse tools and best practices.
Hello SweetOps!
As AI assistants become increasingly integrated into developer workflows, we're excited to announce support for the llms.txt standard across our documentation site.
Think of llms.txt as the AI equivalent of robots.txt for search engines. It's an emerging standard that provides LLMs with structured, curated documentation in a format optimized for their understanding. Rather than crawling entire websites and hitting context window limits, AI assistants can now access our documentation in two curated formats:
/llms.txt - A compact list of important documentation links
/llms-full.txt - Full documentation content in markdown format
When you ask an AI assistant about Atmos, Terraform components, or Cloud Posse best practices, you'll get more accurate and up-to-date responses. The assistant can reference our curated documentation directly instead of relying on training data that may be outdated or incomplete.
The llms.txt standard is gaining adoption across the developer community. If you maintain documentation, consider implementing it for your projects. It's a simple way to make your content more accessible to the AI tools your users are already using.
We've documented our formal process for deprecating and archiving components to ensure transparency and give our community adequate notice when repositories are being sunset.
Hello SweetOps!
As part of our commitment to maintaining 300+ open source projects across Terraform modules, components, and other tooling, we occasionally need to deprecate repositories that are no longer actively maintained or have been superseded by better alternatives.
We've added comprehensive documentation outlining our Deprecation and Archival Process to ensure this transition is as smooth as possible for everyone in our community.
When we deprecate a repository, here's what you can expect:
GitHub Issue Created: A pinned issue with detailed explanation, timeline, and migration guidance
README Warnings Added: Prominent deprecation notices at the top of documentation
Blog Post Published: Announcement in our changelog/blog about the deprecation
Pull Request Submitted: All changes announced via PR for community visibility
Grace Period: Typically 90+ days for the community to migrate and ask questions
Repository Archived: After the grace period, repos are archived (not deleted) and remain publicly accessible
Blog Post Updated: Announcement updated to reflect the archival completion
As stated in our GitHub documentation, we commit to always provide free and public access to our Open Source repositories. Even when archived, repositories remain accessible for historical reference and continued use.
Questions?
If you have questions about deprecated components or need migration assistance, reach out in the SweetOps Slack or GitHub Discussions.
When simplicity meets automation, sometimes it's the hidden complexity that bites back.
For a while, running Karpenter on AWS Fargate sounded like a perfect solution. No nodes to manage, automatic scaling, and no EC2 lifecycle headaches. The AWS EKS Best Practices Guide and Karpenter's official documentation both present Fargate as a viable option for running the Karpenter controller.
But in practice, that setup started to cause problems for certain EKS add-ons. Over time, those lessons led us — and our customers — to recommend using a small managed node group (MNG) instead of relying solely on Fargate.
This recommendation diverges from some official AWS guidance, and we acknowledge that. Here's why we made this decision.
Why Fargate Was Attractive (and Still Is, Sometimes)
The appeal of Fargate for Karpenter is understandable:
No need to bootstrap a managed node group before deploying Karpenter
Simpler initial setup for teams not using Infrastructure-as-Code frameworks
Karpenter's early versions had limited integration with managed node pools
It showcased Karpenter's capabilities in the most dramatic way possible
For teams deploying clusters manually or with basic tooling, Fargate eliminates several complex setup steps. But when you're using sophisticated Infrastructure-as-Code like Cloud Posse's Terraform components, that initial complexity is already handled—and the operational benefits of a managed node group become far more valuable.
The Problem with "No Nodes" (and the Terraform Catch-22)
EKS cluster creation with Terraform requires certain managed add-ons — like CoreDNS or the EBS CSI driver — to become active before Terraform considers the cluster complete.
But Fargate pods don't exist until there's a workload that needs them. That means when Terraform tries to deploy add-ons, there are no compute nodes for the add-ons to run on. Terraform waits… and waits… until the cluster creation fails.
Terraform enforces a strict dependency model: it won't complete a resource until it's ready. Without a static node group, Terraform can't successfully create the cluster (because the add-ons can't start). And without those add-ons running, Karpenter can't launch its first node (because Karpenter itself is waiting on the cluster to stabilize).
This circular dependency means your beautiful "fully automated" Fargate-only cluster gets stuck in the most ironic place: bootstrap deadlock.
You can manually retry or patch things later, but that defeats the purpose of automation. We build for repeatability — not babysitting.
Even after getting past cluster creation, there are subtle but serious issues with high availability.
By AWS and Cloud Posse best practices, production-grade clusters should span three availability zones, with cluster-critical services distributed across them.
However, during initial scheduling with managed node groups, Karpenter might spin up just one node large enough to fit all your add-on pods — even if they request three replicas with anti-affinity rules. Kubernetes will happily co-locate them all on that single node.
Once they're running, those pods don't move automatically, even as the cluster grows. The result?
A deceptively healthy cluster with all your CoreDNS replicas living on the same node in one AZ — a single point of failure disguised as a distributed system.
While topologySpreadConstraints can help encourage multi-AZ distribution, they don't guarantee it during the critical cluster bootstrap phase when Karpenter is creating its first nodes.
Each Fargate pod runs on its own isolated compute resource (one pod per node)
No support for EBS-backed dynamic PVCs; only EFS CSI volumes are supported
Fixed CPU and memory configurations with coarse granularity
256 MB memory overhead for Kubernetes components
While these constraints don't necessarily prevent Fargate from working, they add complexity when running cluster-critical infrastructure that needs precise resource allocation and high availability guarantees.
Fargate offers convenience, but at a premium. A pod requesting 2 vCPUs and 4 GiB of memory costs about $0.098/hour, compared to $0.076/hour for an equivalent EC2 c6a.large instance.
Interestingly, the Karpenter team's own guidance has evolved over time. Karpenter's current getting started guide now defaults to using EKS Managed Node Groups in its example configurations, with Fargate presented as an alternative that requires uncommenting configuration sections.
While we can't pinpoint exactly when this shift occurred, it suggests the Karpenter team recognized that managed node groups provide a more reliable foundation for most production use cases.
Organizations that have accepted the operational trade-offs and built workarounds
However, be aware that development clusters that are frequently rebuilt will hit the Terraform bootstrap deadlock problem more often—making automation failures a regular occurrence rather than a one-time setup issue.
It's worth noting that experienced practitioners in the SweetOps community have successfully run Karpenter on Fargate for years across multiple production clusters. Their setups work, and they've built processes around the constraints.
This proves our recommendation isn't absolute—some teams make Fargate work through careful configuration and accepted trade-offs. However, these same practitioners acknowledged they'd likely choose MNG if starting fresh today with modern tooling.
"Karpenter doesn't use voting. Leader election uses Kubernetes leases. There's no strict technical requirement to have three pods — unless you actually care about staying up."
— Ihor Urazov, SweetOps Slack
That's the key insight. The technical requirements are flexible—it's your operational requirements that determine the right choice.
If staying up matters, if automation matters, if avoiding manual intervention matters, then give your cluster something solid to stand on. A small, stable managed node pool does exactly that.
It's worth mentioning that AWS introduced EKS Auto Mode in December 2024, which takes a fundamentally different approach to solving these problems.
EKS Auto Mode runs Karpenter and other critical cluster components (like the EBS CSI driver and Load Balancer Controller) off-cluster as AWS-managed services. This elegantly sidesteps the bootstrap deadlock problem entirely—there's no chicken-and-egg dependency because the control plane components don't need to run inside your cluster.
The cluster starts with zero nodes and automatically provisions compute capacity as workloads are scheduled. While this solves the technical bootstrap challenge we've discussed, it comes with trade-offs:
Additional 12-15% cost premium on top of EC2 instance costs
Lock-in to AWS VPC CNI (can't use alternatives like Cilium or Calico)
Less control over cluster infrastructure configuration
Available only for Kubernetes 1.29+ and not in all AWS regions
For organizations willing to accept these constraints in exchange for fully managed operations, EKS Auto Mode may address many of the concerns raised in this post. However, for teams requiring fine-grained control, cost optimization, or running on older Kubernetes versions, the MNG + Karpenter approach remains highly relevant.
We’re excited to announce our new Platform Advisory service—now available to Cloud Posse customers. Private access to Cloud Posse engineers.
Many of our larger customers—especially in fintech, health tech—who operate in regulated industries have asked for a way to get private, real-time access to senior Cloud Posse engineers for their most critical projects.
These teams often run into scenarios where:
Delays, mistakes, or failed migrations would cost big
They need to de-risk complex platform changes
They want trusted guidance on Cloud Posse architecture and components—from the engineers who built it
Platform Advisory was designed specifically to address these needs.
Private Slack Connect → direct access to Cloud Posse’s staff-to-principal-level engineers
On-demand Zoom sessions → architecture reviews, migration planning, compliance discussions, and more
Same-day response (4-hour SLA) → for high-impact requests
10 hrs/month of Flexible Support included → for bug fixes, upgrades, Atmos enhancements, new components, and integration work
It’s designed for teams running on Cloud Posse’s reference architecture and open source components who need priority access to expert guidance—especially when getting it wrong isn’t an option.
As more of our customers adopt Cloud Posse architecture for mission-critical platforms, they’ve asked for a way to engage more deeply—especially for projects where de-risking migrations and accelerating delivery matters.
When the team is facing a complex migration, rolling out new environments, or making high-impact platform changes—waiting days for answers isn’t good enough.
Platform Advisory gives them priority access to engineers who:
Know Cloud Posse architecture inside and out
Understand their environments and goals
Can chart the best path forward quickly and safely
We're excited to announce the completion of the second phase of our Component Testing project, which has added automated testing for 27 components. This milestone follows our successful migration of 160+ Terraform Components from a monorepo to individual repositories, making them more maintainable and testable.
Hello SweetOps!
A few months ago, we embarked on a MASSIVE project to enable Component Testing.
The goal is to improve the stability of our components, detect and fix integration errors, and pave the way for confident delivery of new features. In the first phase, we split the cloudposse/terraform-aws-components monorepo consisting of 160+ Terraform Components into individual repositories in the cloudposse-terraform-components GitHub organization. We updated the cloudposse/github-action-atmos-component-updater GitHub action to rewrite URLs in component manifests automatically, allowing you to smoothly migrate to new repositories.
Now, we are happy to announce that we have completed the second phase of this project, introducing automated tests for the first 27 components. Hopefully, you are already using components from the new organization!
The complete list of covered components can be found here.
We've developed a Go-based testing framework built on top of Terratest, optimized specifically for testing Atmos components.
Additionally, we created a step-by-step guide to help you write effective component tests.
You can track the project's progress on this board.
We invite everyone to contribute to this project.
Please like the "Add component tests" issue in the corresponding component repository for which you are interested in prioritizing test coverage. If you want to contribute more, we have the opportunity.
You can take any "Add component tests" issue with the "Good First Question" label and contribute to the test following our documentation.
We will prioritize reviewing your PRs in the #pr-reviews channel and help ensure they get merged smoothly.. Feel free to DM to @Erik Osterman or @Igor Rodionov in Slack with any questions or feedback.
Join the Conversation!
Want to help shape the future of our Terraform components? We're building it in the open and you're invited.
Join us in the SweetOps Slack to chat about component testing, automation, and all things Terraform.
P.S.: Huge thanks to @RoseSecurity for the first community-driven component test contribution here.
The GitHub repository for Cloud Posse's Terraform components has migrated to a dedicated GitHub organization. All documentation remains here, but all future updates, contributions, and issue tracking for the source code should now be directed to the respective repositories in the new organization.
We're excited to announce that starting on November 12, 2024, we will begin migrating each component in the cloudposse/terraform-aws-components repository to individual repositories under a new GitHub organization. This change aims to improve the stability, maintainability, and usability of our components.
Starting on November 12, this repository will be set to read-only mode, marking the beginning of a code freeze. No new pull requests or issues will be accepted here after that date.
No, only the terraform-aws-components repository is affected. Our Terraform modules will remain where they are.
We are committed to making this transition as seamless as possible. If you have any questions or concerns, please feel free to post them in this issue. Your feedback is important to us, and we appreciate your support as we embark on this new chapter!