Skip to main content

Account Management

This chapter presents how Cloud Posse designs and manages AWS Account architectures. We will explain how Cloud Posse provisions and manages AWS Accounts using Atmos and Terraform, the reasoning behind our decisions, and how this architecture will better align your organization with the AWS Well-Architected Framework.

You will learn

  • Why to leverage multiple AWS accounts within an AWS Organization
  • How we organize accounts into organizational units (OUs) to manage access and apply Service Control Policies (SCPs) to provide guard rails
  • The set of components we use to provision, configure, and manage AWS accounts, including account-level settings, service control policies, and Terraform state backends, using native Terraform with Atmos
AI generated voice

The Problem

The AWS Well-Architected Framework defines AWS architectural best practices and presents a set of foundational questions to enable you to understand how a specific architecture aligns with cloud best practices.

The AWS Well-Architected Framework provides several foundational recommendations, one of which is to distribute workloads across multiple AWS accounts. However, the framework does not prescribe how this should be achieved. AWS offers resources such as Control Tower or Account Factory for provisioning accounts, but these resources have some limitations. The primary issue is that they cannot be managed sufficiently with Terraform, which means manual effort is required to use them.

Our Solution

Cloud Posse has developed a set of components to provision, configure, and manage AWS Accounts and Organizations.


Using an Organization

Leveraging multiple AWS accounts within an AWS Organization is the only way to satisfy these requirements. Guard rails can be created to restrict what can happen in an account and by whom.

We then further organize the flat account structure into organizational units. Organizational units (OUs) can then leverage things like Service Control Policies to restrict what can happen inside the accounts.

core (OU)
Responsible for management accounts, such as the organizational root account or a network hub. These accounts are singletons and will never need to be duplicated.
plat (OU)
Responsible for platform accounts, such as sandbox, dev, staging, and prod. These accounts are dynamic and can be specific to the needs of your Organizations.

Account Boundaries

Constructs like VPCs only provide network-level isolation, but not IAM-level isolation. And within a single AWS account, there’s no practical way to manage IAM-level boundaries between multiple stages like dev/staging/prod. For example, to provision most Terraform modules, “administrative” level access is required because provisioning any IAM roles requires admin privileges. That would mean that a developer needs to be an “admin” in order to iterate on a module.

Multiple AWS accounts should be used to provide a higher degree of isolation by segmenting/isolating workloads. There is no additional cost for operating multiple AWS accounts. It does add additional overhead to manage as a standard set of components will to manage the account. AWS Support only applies to one account, so it may need to be purchased for each account unless the organization upgrades to Enterprise Support.

Multiple AWS accounts are all managed underneath an AWS Organization and organized into multiple organizational units (OUs). Service Control Policies can restrict what runs in an account and place boundaries around an account that even account-level administrators cannot bypass.

Account Architecture

By convention, we prefix each account name with its organizational unit (OU) to distinguish it from other accounts of the same type. For example, if we have an OU called plat (short for platform) and an account called "production" (or prod for short), we would name the account plat-prod. In practice, there might be multiple production accounts, such as in a data OU, a network OU, and a plat OU. By prefixing each account with its OU, it is sufficiently disambiguated and follows a consistent convention.

core-root

The "root" (parent, billing) account creates all child accounts. The root account has special capabilities not found in any other account

An administrator in the root account by default has the OrganizationAccountAccessRole to all other accounts (admin access)

Organizational CloudTrails can only be provisioned in this account. It’s the only account that can have member accounts associated with it

Service Control Policies can only be set in this account

It’s the only account that can manage the AWS Organization

core-audit

The "audit" account is where all logs end up

core-security

The "security" account is where to run automated security scanning software that might operate in a read-only fashion against the audit account.

core-identity

The "identity" account is where to add users and delegate access to the other accounts and is where users log in

core-network

The “network” account is where the transit gateway is managed and all inter-account routing

core-dns

The “dns” account is the owner for all zones (may have a legal role with Route53Registrar.* permissions). Cannot touch zones or anything else. Includes billing.

core-auto

The “automation” account is where any gitops automation will live. Some automation (like Spacelift) has “god” mode in this account. The auto account will typically have transit gateway access to all other accounts, therefore we want to limit what is deployed in the automation account to only those services which need it.

core-artifacts

This “artifacts” account is where we recommend centralizing and storing artifacts (e.g. ECR, assets, etc) for CI/CD

plat-prod

The "production" is the account where you run your most mission-critical applications

plat-staging

The “staging” account is where QA and integration tests will run for public consumption.

This is production for QA engineers and partners doing integration tests. It must be stable for third-parties to test. It runs a kubernetes cluster.

plat-dev

The "dev" account is where to run automated tests, load tests infrastructure code.

This is where the entire engineering organization operates daily. It needs to be stable for developers. This environment is Production for developers to develop code.

plat-sandbox

The "sandbox" account is where you let your developers have fun and break things. Developers get admin. This is where changes happen first. It will be used by developers who need the bleeding edge. Only DevOps work here or developers trying to get net-new applications added to tools like slice.

Terraform State

We need someplace to store the terraform state. Multiple options exist (e.g. Vault, Terraform Enterprise, GitLab, Spacelift), but the only one we’ll focus on right now is using S3. The terraform state may contain secrets, which is unavoidable for certain kinds of resources (e.g. master credentials for RDS clusters). For this reason, it is advisable for companies with security and compliance requirements to segment their state backends to make it easier to control with IAM who has access to what.

While on the other hand adding multiple state backends is good from a security perspective, on the other it unnecessarily complicates the architecture for companies that do not need the added layer of security.

We will use a single S3 bucket, as it is the least complicated to maintain. Anyone who should be able to run terraform locally will need read/write access to this state bucket.

Components

Cloud Posse manages this process with the following components.

account

This component is responsible for provisioning the full account hierarchy along with Organizational Units (OUs). It includes the ability to associate Service Control Policies (SCPs) to the Organization, each Organizational Unit and account.

account-settings

This component is responsible for provisioning account level settings: IAM password policy, AWS Account Alias, EBS encryption, and Service Quotas. We can also leverage this component to enable account or organization level budgets.

account-map

Transforms account metadata to a safe place for all designated roles to able to access. IAM roles should not able to read account. Once account-map is provisioned, other components can utilized remote-state to pull account metadata such as Account ID mapping or IAM Roles to assume for a given account.

account-quotas

This component is responsible for requesting service quota increases. We recommend making requests here rather than in account-settings because account-settings is a restricted component that can only be applied by SuperAdmin.

tfstate-backend

Provisions the Terraform state backends. This component already follows all standard best practices around private ACLs, encryption, versioning, locking, etc.

cloudtrail

This component is responsible for provisioning cloudtrail auditing in an individual account. It's expected to be used alongside the cloudtrail-bucket component as it utilizes that bucket via remote state.

cloudtrail-bucket
This component is responsible for provisioning a bucket for storing cloudtrail logs for auditing purposes

Design Decisions

Review Design Decisions and record your decisions now. You will need the results of these decisions going forward.

What comes next?

Next, we'll prepare the organization to provision the Terraform State backend, followed by account provisioning. If you're curious about the though that went into this process, please review the design decisions documentation.

References

Mixins and Imports with Atmos

As infrastructure grows, we end up with hundreds or thousands of settings for components and stack configurations. If we copy and paste these settings everywhere, it’s error-prone and not DRY. What we really want to do is to define a sane set of defaults and override those defaults when we need them to change.

We accomplish this with Mixins. Mixins are imported into all stacks and each follow a set of rules. We use the mixins/region and mixins/account configurations to define common variables for all stacks. For example, mixins/region/us-east-1.yaml will define the variable region: us-east-1.

Note. Do not import components into the account or region mixins. These are imported multiple times to define common variables, so any component imports would be duplicated and cause an Atmos error such as this:

Executing command:
/usr/bin/atmos terraform deploy account-settings -s core-gbl-artifacts

Found duplicate config for the component 'account-settings' for the stack 'core-gbl-artifacts' in the files: orgs/cch/core/artifacts/global-region/baseline, orgs/cch/core/artifacts/global-region/monitoring, orgs/cch/core/artifacts/global-region/identity.
Check that all context variables in the stack name pattern '{tenant}-{environment}-{stage}' are correctly defined in the files and not duplicated.
Check that all imports are valid.

exit status 1