Skip to main content

Making Our Docs AI-Friendly with llms.txt

Erik Osterman
Cloud Posse

We've implemented the llms.txt standard to make our documentation more accessible to AI assistants, ensuring better responses when you ask ChatGPT, Claude, or other LLMs about Cloud Posse tools and best practices.

Hello SweetOps!

As AI assistants become increasingly integrated into developer workflows, we're excited to announce support for the llms.txt standard across our documentation site.

What is llms.txt?

Think of llms.txt as the AI equivalent of robots.txt for search engines. It's an emerging standard that provides LLMs with structured, curated documentation in a format optimized for their understanding. Rather than crawling entire websites and hitting context window limits, AI assistants can now access our documentation in two curated formats:

  • /llms.txt - A compact list of important documentation links
  • /llms-full.txt - Full documentation content in markdown format

Why This Matters

When you ask an AI assistant about Atmos, Terraform components, or Cloud Posse best practices, you'll get more accurate and up-to-date responses. The assistant can reference our curated documentation directly instead of relying on training data that may be outdated or incomplete.

Benefits for the Community

  • Better AI Assistance: More accurate responses when asking AI tools about our projects
  • Efficient Context Usage: LLMs can access precisely what they need without crawling
  • Up-to-Date Information: Always references current documentation, not stale training data
  • Developer Velocity: Faster answers means less time searching, more time building

How It Works

We're using the docusaurus-plugin-llms to automatically generate these files from our Docusaurus site. The plugin:

  1. Prioritizes core documentation sections (Atmos, components, tutorials)
  2. Includes blog content for recent updates and announcements
  3. Generates both compact and full-text versions
  4. Automatically updates with each documentation deployment

Try It Out

Next time you're working with an AI assistant, try asking questions about:

  • "How do I configure Atmos stacks?"
  • "What's the latest on Cloud Posse component deprecation?"
  • "Show me examples of Terraform component patterns"

The assistant will have direct access to our structured documentation, leading to better, more accurate responses.

Join the Movement

The llms.txt standard is gaining adoption across the developer community. If you maintain documentation, consider implementing it for your projects. It's a simple way to make your content more accessible to the AI tools your users are already using.

Questions?

Have feedback about our AI-friendly documentation? Join us in the SweetOps Slack or GitHub Discussions.

Introducing Our Component Deprecation Process

Erik Osterman
Cloud Posse

We've documented our formal process for deprecating and archiving components to ensure transparency and give our community adequate notice when repositories are being sunset.

Hello SweetOps!

As part of our commitment to maintaining 300+ open source projects across Terraform modules, components, and other tooling, we occasionally need to deprecate repositories that are no longer actively maintained or have been superseded by better alternatives.

What to Expect

We've added comprehensive documentation outlining our Deprecation and Archival Process to ensure this transition is as smooth as possible for everyone in our community.

When we deprecate a repository, here's what you can expect:

  1. GitHub Issue Created: A pinned issue with detailed explanation, timeline, and migration guidance
  2. README Warnings Added: Prominent deprecation notices at the top of documentation
  3. Blog Post Published: Announcement in our changelog/blog about the deprecation
  4. Pull Request Submitted: All changes announced via PR for community visibility
  5. Grace Period: Typically 90+ days for the community to migrate and ask questions
  6. Repository Archived: After the grace period, repos are archived (not deleted) and remain publicly accessible
  7. Blog Post Updated: Announcement updated to reflect the archival completion

Why This Matters

This structured approach ensures that:

  • You have advance notice before any repository is archived
  • Migration paths and alternatives are clearly documented
  • Historical access to code is preserved
  • The community can provide feedback during the deprecation period

Our Commitment

As stated in our GitHub documentation, we commit to always provide free and public access to our Open Source repositories. Even when archived, repositories remain accessible for historical reference and continued use.

Questions?

If you have questions about deprecated components or need migration assistance, reach out in the SweetOps Slack or GitHub Discussions.

Why We Recommend Managed Node Groups Over Fargate for EKS Add-Ons

Erik Osterman
Cloud Posse

When simplicity meets automation, sometimes it's the hidden complexity that bites back.

For a while, running Karpenter on AWS Fargate sounded like a perfect solution. No nodes to manage, automatic scaling, and no EC2 lifecycle headaches. The AWS EKS Best Practices Guide and Karpenter's official documentation both present Fargate as a viable option for running the Karpenter controller.

But in practice, that setup started to cause problems for certain EKS add-ons. Over time, those lessons led us — and our customers — to recommend using a small managed node group (MNG) instead of relying solely on Fargate.

This recommendation diverges from some official AWS guidance, and we acknowledge that. Here's why we made this decision.

Why Fargate Was Attractive (and Still Is, Sometimes)

The appeal of Fargate for Karpenter is understandable:

  • No need to bootstrap a managed node group before deploying Karpenter
  • Simpler initial setup for teams not using Infrastructure-as-Code frameworks
  • Karpenter's early versions had limited integration with managed node pools
  • It showcased Karpenter's capabilities in the most dramatic way possible

For teams deploying clusters manually or with basic tooling, Fargate eliminates several complex setup steps. But when you're using sophisticated Infrastructure-as-Code like Cloud Posse's Terraform components, that initial complexity is already handled—and the operational benefits of a managed node group become far more valuable.

The Problem with "No Nodes" (and the Terraform Catch-22)

EKS cluster creation with Terraform requires certain managed add-ons — like CoreDNS or the EBS CSI driver — to become active before Terraform considers the cluster complete.

But Fargate pods don't exist until there's a workload that needs them. That means when Terraform tries to deploy add-ons, there are no compute nodes for the add-ons to run on. Terraform waits… and waits… until the cluster creation fails.

Terraform enforces a strict dependency model: it won't complete a resource until it's ready. Without a static node group, Terraform can't successfully create the cluster (because the add-ons can't start). And without those add-ons running, Karpenter can't launch its first node (because Karpenter itself is waiting on the cluster to stabilize).

This circular dependency means your beautiful "fully automated" Fargate-only cluster gets stuck in the most ironic place: bootstrap deadlock.

You can manually retry or patch things later, but that defeats the purpose of automation. We build for repeatability — not babysitting.

The Hidden Cost of "Serverless Nodes"

Even after getting past cluster creation, there are subtle but serious issues with high availability.

By AWS and Cloud Posse best practices, production-grade clusters should span three availability zones, with cluster-critical services distributed across them.

However, during initial scheduling with managed node groups, Karpenter might spin up just one node large enough to fit all your add-on pods — even if they request three replicas with anti-affinity rules. Kubernetes will happily co-locate them all on that single node.

Once they're running, those pods don't move automatically, even as the cluster grows. The result?

A deceptively healthy cluster with all your CoreDNS replicas living on the same node in one AZ — a single point of failure disguised as a distributed system.

While topologySpreadConstraints can help encourage multi-AZ distribution, they don't guarantee it during the critical cluster bootstrap phase when Karpenter is creating its first nodes.

The Solution: A Minimal Managed Node Pool

Our solution is simple:

Deploy a tiny managed node group — one node per availability zone — as part of your base cluster.

  • This provides a home for cluster-critical add-ons during creation
  • It ensures that CoreDNS, EBS CSI, and other vital components are naturally distributed across AZs
  • It gives Karpenter a stable platform to run on
  • And it eliminates the bootstrap deadlock problem entirely

You can even disable autoscaling for this node pool. One node per AZ is enough.

Think of it as your cluster's heartbeat — steady, predictable, and inexpensive.

Additional Fargate Constraints

Beyond the HA challenges, Fargate has architectural constraints that can affect cluster add-ons:

  • Each Fargate pod runs on its own isolated compute resource (one pod per node)
  • No support for EBS-backed dynamic PVCs; only EFS CSI volumes are supported
  • Fixed CPU and memory configurations with coarse granularity
  • 256 MB memory overhead for Kubernetes components

While these constraints don't necessarily prevent Fargate from working, they add complexity when running cluster-critical infrastructure that needs precise resource allocation and high availability guarantees.

Cost and Flexibility

Fargate offers convenience, but at a premium. A pod requesting 2 vCPUs and 4 GiB of memory costs about $0.098/hour, compared to $0.076/hour for an equivalent EC2 c6a.large instance.

And because Fargate bills in coarse increments, you often overpay for partial capacity.

By contrast, the hybrid approach unlocks significant advantages:

  • Static MNG with On-Demand instances provides a stable foundation for cluster add-ons
  • Use cost-effective Graviton instances (c7g.medium) to reduce baseline costs
  • Karpenter provisions Spot instances exclusively for application workloads (not add-ons)
  • Achieve cost savings on application pods while maintaining reliability for cluster infrastructure

The result: stable cluster services on On-Demand, cost-optimized applications on Spot.

The Evolution of Karpenter's Recommendations

Interestingly, the Karpenter team's own guidance has evolved over time. Karpenter's current getting started guide now defaults to using EKS Managed Node Groups in its example configurations, with Fargate presented as an alternative that requires uncommenting configuration sections.

While we can't pinpoint exactly when this shift occurred, it suggests the Karpenter team recognized that managed node groups provide a more reliable foundation for most production use cases.

Lessons Learned

At Cloud Posse, we love automation — but we love reliability through simplicity even more.

Running Karpenter on Fargate works for proof-of-concepts or ephemeral clusters.

But for production systems where uptime and high availability matter, a hybrid model is the clear winner:

  • Static MNG with On-Demand instances for cluster-critical add-ons (CoreDNS, Karpenter, etc.)
  • Karpenter provisioning Spot instances for dynamic application workloads
  • Fargate only when you truly need pod-level isolation

It's not about Fargate being bad — it's about knowing where it fits in your architecture.

When Fargate-Only Might Still Work

To be fair, there are scenarios where running Karpenter on Fargate might make sense:

  • Long-lived development environments where the $120/month MNG baseline cost matters more than availability
  • Clusters deployed manually (not via Terraform) where bootstrap automation isn't critical
  • Proof-of-concept deployments demonstrating Karpenter's capabilities
  • Organizations that have accepted the operational trade-offs and built workarounds

However, be aware that development clusters that are frequently rebuilt will hit the Terraform bootstrap deadlock problem more often—making automation failures a regular occurrence rather than a one-time setup issue.

Your Mileage May Vary

It's worth noting that experienced practitioners in the SweetOps community have successfully run Karpenter on Fargate for years across multiple production clusters. Their setups work, and they've built processes around the constraints.

This proves our recommendation isn't absolute—some teams make Fargate work through careful configuration and accepted trade-offs. However, these same practitioners acknowledged they'd likely choose MNG if starting fresh today with modern tooling.

"Karpenter doesn't use voting. Leader election uses Kubernetes leases. There's no strict technical requirement to have three pods — unless you actually care about staying up."

— Ihor Urazov, SweetOps Slack

That's the key insight. The technical requirements are flexible—it's your operational requirements that determine the right choice.

If staying up matters, if automation matters, if avoiding manual intervention matters, then give your cluster something solid to stand on. A small, stable managed node pool does exactly that.

What About EKS Auto Mode?

It's worth mentioning that AWS introduced EKS Auto Mode in December 2024, which takes a fundamentally different approach to solving these problems.

EKS Auto Mode runs Karpenter and other critical cluster components (like the EBS CSI driver and Load Balancer Controller) off-cluster as AWS-managed services. This elegantly sidesteps the bootstrap deadlock problem entirely—there's no chicken-and-egg dependency because the control plane components don't need to run inside your cluster.

The cluster starts with zero nodes and automatically provisions compute capacity as workloads are scheduled. While this solves the technical bootstrap challenge we've discussed, it comes with trade-offs:

  • Additional 12-15% cost premium on top of EC2 instance costs
  • Lock-in to AWS VPC CNI (can't use alternatives like Cilium or Calico)
  • Less control over cluster infrastructure configuration
  • Available only for Kubernetes 1.29+ and not in all AWS regions

For organizations willing to accept these constraints in exchange for fully managed operations, EKS Auto Mode may address many of the concerns raised in this post. However, for teams requiring fine-grained control, cost optimization, or running on older Kubernetes versions, the MNG + Karpenter approach remains highly relevant.

Announcing Platform Advisory

Erik Osterman
Cloud Posse

We’re excited to announce our new Platform Advisory service—now available to Cloud Posse customers. Private access to Cloud Posse engineers.

Many of our larger customers—especially in fintech, health tech—who operate in regulated industries have asked for a way to get private, real-time access to senior Cloud Posse engineers for their most critical projects.

These teams often run into scenarios where:

  • Delays, mistakes, or failed migrations would cost big
  • They need to de-risk complex platform changes
  • They want trusted guidance on Cloud Posse architecture and components—from the engineers who built it

Platform Advisory was designed specifically to address these needs.


What is Platform Advisory?

Platform Advisory gives your team:

  • Private Slack Connect → direct access to Cloud Posse’s staff-to-principal-level engineers
  • On-demand Zoom sessions → architecture reviews, migration planning, compliance discussions, and more
  • Same-day response (4-hour SLA) → for high-impact requests
  • 10 hrs/month of Flexible Support included → for bug fixes, upgrades, Atmos enhancements, new components, and integration work

It’s designed for teams running on Cloud Posse’s reference architecture and open source components who need priority access to expert guidance—especially when getting it wrong isn’t an option.


Why we built this

As more of our customers adopt Cloud Posse architecture for mission-critical platforms, they’ve asked for a way to engage more deeply—especially for projects where de-risking migrations and accelerating delivery matters.

When the team is facing a complex migration, rolling out new environments, or making high-impact platform changes—waiting days for answers isn’t good enough.

Platform Advisory gives them priority access to engineers who:

  • Know Cloud Posse architecture inside and out
  • Understand their environments and goals
  • Can chart the best path forward quickly and safely

How it fits with our other support options

Support OptionBest For
Essential SupportSelf-service teams who want async guidance
Flexible SupportScheduled hands-on engineering work
Platform AdvisoryTeams where delays, mistakes, or failed migrations would cost big

Where to learn more

You can explore all the details on the Platform Advisory support page.

If you’re unsure whether Platform Advisory is the right fit for your team, reach out to us—we’re happy to help.

Remember: When you invest in Cloud Posse, you’re not just helping your team—you’re strengthening the ecosystem your business depends on.

We’re excited to make Platform Advisory available—and we look forward to helping more teams succeed on Cloud Posse architecture.

Automated Component Testing

Erik Osterman
Cloud Posse

We're excited to announce the completion of the second phase of our Component Testing project, which has added automated testing for 27 components. This milestone follows our successful migration of 160+ Terraform Components from a monorepo to individual repositories, making them more maintainable and testable.

Hello SweetOps!

A few months ago, we embarked on a MASSIVE project to enable Component Testing.

The goal is to improve the stability of our components, detect and fix integration errors, and pave the way for confident delivery of new features. In the first phase, we split the cloudposse/terraform-aws-components monorepo consisting of 160+ Terraform Components into individual repositories in the cloudposse-terraform-components GitHub organization. We updated the cloudposse/github-action-atmos-component-updater GitHub action to rewrite URLs in component manifests automatically, allowing you to smoothly migrate to new repositories.

Current Status

Now, we are happy to announce that we have completed the second phase of this project, introducing automated tests for the first 27 components. Hopefully, you are already using components from the new organization!

The complete list of covered components can be found here.

We've developed a Go-based testing framework built on top of Terratest, optimized specifically for testing Atmos components.
Additionally, we created a step-by-step guide to help you write effective component tests.
You can track the project's progress on this board.

We invite everyone to contribute to this project.

Please like the "Add component tests" issue in the corresponding component repository for which you are interested in prioritizing test coverage. If you want to contribute more, we have the opportunity.

How can you help?

We really need help writing tests.

You can take any "Add component tests" issue with the "Good First Question" label and contribute to the test following our documentation.

We will prioritize reviewing your PRs in the #pr-reviews channel and help ensure they get merged smoothly.. Feel free to DM to @Erik Osterman or @Igor Rodionov in Slack with any questions or feedback.

Join the Conversation!

Want to help shape the future of our Terraform components? We're building it in the open and you're invited.
Join us in the SweetOps Slack to chat about component testing, automation, and all things Terraform.

P.S.: Huge thanks to @RoseSecurity for the first community-driven component test contribution here.

Terraform Component GitHub Repository Has Moved!

Igor Rodionov
Cloud Posse

The GitHub repository for Cloud Posse's Terraform components has migrated to a dedicated GitHub organization. All documentation remains here, but all future updates, contributions, and issue tracking for the source code should now be directed to the respective repositories in the new organization.

We're excited to announce that starting on November 12, 2024, we will begin migrating each component in the cloudposse/terraform-aws-components repository to individual repositories under a new GitHub organization. This change aims to improve the stability, maintainability, and usability of our components.

Why This Migration?

Our goal is to make each component easier to use, contribute to, and maintain. This migration will allow us to:

  • Leverage terratest automation for better testing
  • Implement semantic versioning to clearly communicate updates and breaking changes
  • Improve PR review times and accelerate community contributions
  • Enable Dependabot automation for dependency management
  • And much more!

What to Expect Starting November 12, 2024

Migration Timeline

The migration will begin on November 12 and is anticipated to finish by the end of the following week.

Code Freeze

Starting on November 12, this repository will be set to read-only mode, marking the beginning of a code freeze. No new pull requests or issues will be accepted here after that date.

New Contribution Workflow

After the migration, all contributions should be directed to the new individual component repositories.

Updated Documentation

To support this transition, we are updating our documentation and cloudposse-component updater.

Future Archiving

In approximately six months, we plan to archive this repository and transfer it to the cloudposse-archives organization.

Frequently Asked Questions

Does this affect Terraform modules?

No, only the terraform-aws-components repository is affected. Our Terraform modules will remain where they are.

We are committed to making this transition as seamless as possible. If you have any questions or concerns, please feel free to post them in this issue. Your feedback is important to us, and we appreciate your support as we embark on this new chapter!