Skip to main content

Terraform Best Practices

Terraform

These are the opinionated best-practices we follow at Cloud Posse. They are inspired by years of experience writing terraform and borrow on the many other helpful resources like those by HashiCorp.

See our general Best Practices which also apply to Terraform.

Variables

Use upstream module or provider variable names where applicable

When writing a module that accepts variable inputs, make sure to use the same names as the upstream to avoid confusion and ambiguity.

Use all lower-case with underscores as separators

Avoid introducing any other syntaxes commonly found in other languages such as CamelCase or pascalCase. For consistency we want all variables to look uniform. This is also inline with the HashiCorp naming conventions.

Use positive variable names to avoid double negatives

All variable inputs that enable/disable a setting should be formatted ...._enabled (e.g. encryption_enabled). It is acceptable for default values to be either false or true.

Use description field for all inputs

All variable inputs need a description field. When the field is provided by an upstream provider (e.g. terraform-aws-provider), use same wording as the upstream docs.

Use sane defaults where applicable

Modules should be as turnkey as possible. The default value should ensure the most secure configuration (E.g. with encryption enabled).

Use nullable = false where appropriate

When passing an argument to a resource, passing null means to use the default value. Prior to Terraform version 1.1.0, passing null to a module input set that value to null rather than to the default value.

Starting with Terraform version 1.1.0, variables can be declared as nullable = false which:

  1. Prevents the variable from being set to null.
  2. Causes the variable to be set to the default value if null is passed in.

You should always use nullable = false for all variables which should never be set to null. This is particularly important for lists, maps, and objects, which, if not required, should default to empty values (i. e. {} or []) rather than null. It can also be useful to set strings to default to "" rather than null and set nullable = false. This will simplify the code since it can count on the variable having a non-null value.

The default nullable = true never needs to be explicitly set. Leave variables with the default nullable = true if a null value is acceptable.

Use feature flags, list, or map inputs for optional functionality

All Cloud Posse modules should respect the null-label enabled feature flag, and when enabled is false, create no resources and generate null outputs (or, in the case of output lists, maps, and objects, empty values may be acceptable to avoid having other modules consuming the outputs fail due to having a null rather than empty value).

Optional functionality should be toggled in either of 2 ways:

  1. Use of a feature flag. Specifically, an input variable of type bool with a name ending in _enabled. Use this mechanism if the option requires no further configuration, e.g. iam_role_enabled or s3_bucket_enabled. Feature flags should always be nullable = false, but the default value can be true or false as desired.
  2. If an optional feature requires further configuration, use a list or map input variable, with an empty input disabling the option and non-empty input providing configuration. In this case, only use a separate feature flag if the list or map input may still cause problems due to relying on computed problems during the plan phase. See Count vs. For Each and Terraform Errors When Planning for more information.
  3. It is never acceptable for an optional feature of the module to be toggled by the value of a string or number input variable, due to the issues explained in Terraform Errors When Planning. However, if an optional feature of a resource may be toggled by such an input if that is the behavior of the resource and the input has the same name as the resource argument.

Use objects with optional fields for complex inputs

When a module requires a complex input, use an object with optional fields. This provides documentation and plan-time validation while avoiding type conversion errors, and allows for future expansion without breaking changes. Make as many fields as possible optional, provide defaults at every level of nesting, and use nullable = false if possible.

Extra (or Misspelled) Fields in Object Inputs Will Be Silently Ignored

If you use an object with defaults as an input, Terraform will not give any indication if the user provides extra fields in the input object. This is particularly a problem if they misspelled an optional field name, because the misspelled field will be silently ignored, and the default value the user intended to override will silently be used. This is a limitation of Terraform. Furthermore, there is no way to add any checks for this situation, because the input will have already been transformed (unexpected fields removed) by the time any validation code runs. This makes using an object a trade-off versus using separate inputs, which do not have this problem, or type = any which allows you to write validation code to catch this problem and additional code to supply defaults for missing fields.

Reserve type = any for exceptional cases where the input is highly variable and/or complex, and the module is designed to handle it. For example, the configuration of a Datadog synthetic test is both highly complex and the Cloud Posse module accepts both an object derived from the synthetics_test resource schema or an object derived from the JSON output of the Datadog API. In this rare case, attempting to maintain a type definition would not only be overly complex, it would slow the adoption of future additions to the interface, and so type = any is appropriate.

When reviewing Cloud Posse modules as examples, you may notice that they often use a large number of input variables of simple types. This is because in the early development of Terraform, there was no good way to define complex objects with defaults. However, now that Terraform supports complex objects with field-level defaults, we recommend using a single object input variable with such defaults to group related configuration, taking into consideration the trade-offs listed in the above caution. This makes the interface easier to understand and use.

For example, prefer:

variable "eip_timeouts" {
type = object({
create = optional(string)
update = optional(string)
delete = optional(string, "30m")
}))
default = {}
nullable = false
}

rather than:

variable "eip_create_timeout" {
type = string
default = null
}
variable "eip_update_timeout" {
type = string
default = null
}
variable "eip_delete_timeout" {
type = string
default = "30m"
}

However, using an object with defaults versus multiple simple inputs is not without trade-offs, as explained in the above caution.

There are a few ways to mitigate this problem besides using separate inputs:

  • If all the defaults are null or empty, you can use a map(string) input variable and use the keys function to check for unexpected fields. This catches errors, but has the drawback that it does not provide documentation of what fields are expected.
  • You can use type = any for inputs, but then you have to write the extra code to validate the input and supply defaults for missing fields. You should also document the expected fields in the input description.
  • If all you are worried about is misspelled field names, you can make the correctly spelled field names required, ensuring they are supplied. Alternatively, if the misspelling is predictable, such as you have a field named minsize but people are likely to try to supply min_size, you can make the misspelled field name optional with a sentinel value and then check for that value in the validation code.

Use custom validators to enforce custom constraints

Use the validation block to enforce custom constraints on input variables. A custom constraint is one that, if violated, would not otherwise cause an error, but would cause the module to behave in an unexpected way.

For example, if the module takes an optional IPv6 CIDR block, you might receive that in a variable of list(string) for reasons explained here. Use a custom validator to ensure that the list has at most one element, because if it has more than one, the module will ignore the extra elements, while a reasonable person might expect them to be used in some way. At the same time, is it not necessary to use a custom validator to enforce that the input is a valid IPv6 CIDR block, because Terraform will already do that for you.

This is perhaps better illustrated by an example of a pseudo-enumeration. Terraform does not support real enumerations, so they are typically implemented as strings with a limited number of acceptable values. For example, Say you have a resource that takes a frequency input that is a string that must be either DAILY or WEEKLY. Even though the value of the string is very restricted, you should not use a custom validator to enforce that for two reasons.

  1. Terraform (technically, the Terraform resource provider) will already enforce that the string is one of those two values and produce an informative error message if it is not.
  2. If you use a custom validator to enforce that the string is one of those two values, and then a later version of the resource adds a new option to the enumeration, such as "HOURLY", the custom validator will prevent the module from using the new value, even though the module woudl function perfectly were it not for the validator. This adds work and delay to the adoption of underlying enhancements to the resource, without providing enough benefit to be worth the extra effort.

Use variables for all secrets with no default value and mark them "sensitive"

All variable inputs for secrets must never define a default value. This ensures that terraform is able to validate user input. The exception to this is if the secret is optional and will be generated for the user automatically when left null or "" (empty).

Use sensitive = true to mark all secret variables as sensitive. This ensures that the value is not printed to the console.

Outputs

Use description field for all outputs

All outputs must have a description set. The description should be based on (or adapted from) the upstream terraform provider where applicable. Avoid simply repeating the variable name as the output description.

Use well-formatted snake case output names

Avoid introducing any other syntaxes commonly found in other languages such as CamelCase or pascalCase. For consistency, we want all variables to look uniform. It also makes code more consistent when using outputs together with terraform remote_state to access those settings from across modules.

Never output secrets

Secrets should never be outputs of modules. Rather, they should be written to secure storage such as AWS Secrets Manager, AWS SSM Parameter Store with KMS encryption, or S3 with KMS encryption at rest. Our preferred mechanism on AWS is using SSM Parameter Store. Values written to SSM are easily retrieved by other terraform modules, or even on the command-line using tools like chamber by Segment.io.

We are very strict about this in our components (a.k.a root modules), the top-most module, because these sensitive outputs are easily leaked in CI/CD pipelines (see tfmask for masking secrets in output only as a last resort). We are less sensitive to this in modules that are typically nested inside of other modules.

Rather than outputting a secret, you may output plain text indicating where the secret is stored, for example RDS master password is in SSM parameter /rds/master_password. You may also want to have another output just for the key for the secret in the secret store, so the key is available to other programs which may be able to retrieve the value given the key.

danger

Regardless of whether the secret is output or not, the fact that a secret is known to Terraform means that its value is stored in plaintext in the Terraform state file. Storing values in SSM Parameter Store or other places does not solve this problem. Even if you store the value in a secure place using some method other than Terraform, if you read it into Terraform, it will be stored in plaintext in the state file.

To keep the secret out of the state file, you must both store and retrieve the secret outside of Terraform. This is a limitation of Terraform that has been discussed practically since Terraform's inception, so a near-term solution is unlikely.

Use symmetrical names

We prefer to keep terraform outputs symmetrical as much as possible with the upstream resource or module, with exception of prefixes. This reduces the amount of entropy in the code or possible ambiguity, while increasing consistency. Below is an example of what *not to do. The expected output name is user_secret_access_key. This is because the other IAM user outputs in the upstream module are prefixed with user_, and then we should borrow the upstream's output name of secret_access_key to become user_secret_access_key for consistency.

Terraform outputs should be symmetrical

Language

Use indented HEREDOC syntax

Using <<-EOT (as opposed to <<EOT without the -) ensures the code can be indented inline with the other code in the project. Note that EOT can be any uppercase string (e.g. CONFIG_FILE)

block {
value = <<-EOT
hello
world
EOT
}

Do not use HEREDOC for JSON, YAML or IAM Policies

There are better ways to achieve the same outcome using terraform interpolations or resources

For JSON, use a combination of a local and the jsonencode function.

For YAML, use a combination of a local and the yamlencode function.

For IAM Policy Documents, use the native iam_policy_document resource.

Do not use long HEREDOC configurations

Use instead the templatefile function and move the configuration to a separate template file.

Use terraform linting

Linting helps to ensure a consistent code formatting, improves code quality and catches common errors with syntax.

Run terraform fmt before committing all code. Use a pre-commit hook to do this automatically. See Terraform Tips & Tricks

Consider using tflint with the aws plugin.

Use CIDR math interpolation functions for network calculations

This reduces the barrier to entry for others to contribute and reduces likelihood of human error. We have a number of terraform modules as well to aide in the subnet calculations since there are multiple strategies depending on the use-case.

Read more: https://www.terraform.io/docs/configuration/interpolation.html#cidrsubnet-iprange-newbits-netnum-

Use .editorconfig in all repos for consistent whitespace

Every mainstream IDE supports plugins for the .editorconfig standard to make it easier to enforce whitespace consistency.

We recommend adopting the whitespace convention of a particular language or project.

This is the standard .editorconfig that we use.

[*]
indent_style = space
indent_size = 2
trim_trailing_whitespace = true

## Override for Makefile
[{Makefile, makefile, GNUmakefile, Makefile.*}]
indent_style = tab
indent_size = 4

[*.sh]
indent_style = tab

Use minimum version pinning on all providers

Terraform's providers are constantly in flux. It is hard to know if the module you write today will work with older versions of the provider APIs, and usually not worth the effort to find out. For that reason we want to advertise the minimum version of the provider we have tested with.

With regard to reusable modules (as opposed to components, a.k.a. root modules), while it is possible that future Terraform or provider versions may introduce breaking changes, such changes, should they occur, can be mitigated by setting maximum version pins in root modules. Furthermore, we cannot test updated providers to find out about problems if we have placed upper limits on versions. Therefore, for all Cloud Posse reusable modules, we only place lower limits on versions.

In your root modules, you may want to include upper limits or pin to exact versions to avoid surprises. It is a trade-off between stability and ease of staying current you will have to evaluate for your own situation. You may prefer to use Terraform's lock files to pin to exact versions, while leaving the coded version requirements as minimums.

Use locals to identify and describe opaque values

Using locals makes code more descriptive and maintainable. Rather than using complex expressions as parameters to some terraform resource, instead move that expression to a local and reference the local in the resource.

State

Use remote state

Use Terraform to create state bucket

This requires a two-phased approach, whereby you first provision the bucket without the remote state enabled. Then enable remote state (e.g. s3 {}) and import remote state by simply rerunning terraform init. We recommend this strategy because it promotes using the best tool for the job and makes it easier to define requirements and use consistent tooling.

Using the terraform-aws-tfstate-backend module it is easy to provision state buckets.

Use backend with support for state locking

We recommend using the S3 backend with DynamoDB for state locking.

Pro Tip: Using the terraform-aws-tfstate-backend this can be easily implemented.

Use terraform or atmos CLI to set backend parameters

Promote the reusability of a root module across accounts by avoiding hardcoded backend requirements. Instead, use the Terraform CLI or atmos to set the current context.

terraform {
required_version = ">= 1.0.0"

backend "s3" {}
}

Future versions of Terraform are usually at least mostly compatible of previous versions, and we want to be able to test the modules with future versions to find out. Therefore, do not place an upper limit on the required_version like ~>0.12.26 or >= 0.13, < 0.15. Always use >= and enforce Terraform versions in your environment by controlling which CLI you use.

Use encrypted S3 bucket with versioning, encryption and strict IAM policies

We recommend not commingling state in the same bucket as other data. This could cause the state to get overridden or compromised. Note, the state contains cached values of all outputs. Consider isolating sensitive areas like production configuration and audit trails (separate buckets, separate organizations).

Pro Tip: Using the terraform-aws-tfstate-backend to easily provision buckets for each stage.

Use Versioning on State Bucket

Use Encryption at Rest on State Bucket

Use .gitignore to exclude terraform state files, state directory backups and core dumps

.terraform
.terraform.tfstate.lock.info
*.tfstate
*.tfstate.backup

Use .dockerigore to exclude terraform statefiles from builds

Example:

**/.terraform*

Use a programmatically consistent naming convention

All resource names (E.g. things provisioned on AWS) must follow a consistent convention. The reason this is so important is that modules are frequently composed inside of other modules. Enforcing consistency increases the likelihood that modules can invoke other modules without colliding on resource names.

To enforce consistency, we require that all modules use the terraform-null-label module. With this module, users have the ability to change the way resource names are generated such as by changing the order of parameters or the delimiter. While the module is opinionated on the parameters, it's proved invaluable as a mechanism for generating consistent resource names.

Module Design

Small Opinionated Modules

We believe that modules should do one thing very well. But in order to do that, it requires being opinionated on the design. Simply wrapping terraform resources for the purposes of modularizing code is not that helpful. Implementing a specific use-case of those resource is more helpful.

Composable Modules

Write all modules to be easily composable into other modules. This is how we're able to achieve economies of scale and stop re-inventing the same patterns over and over again.

Module Usage

Use Terraform registry format with exact version numbers

There are many ways to express a module's source. Our convention is to use Terraform registry syntax with an explicit version.

  source  = "cloudposse/label/null"
version = "0.25.0"

The reason to pin to an explicit version rather than a range like >= 0.25. 0 is that any update is capable of breaking something. Any changes to your infrastructure should be implemented and reviewed under your control, not blindly automatic based on when you deployed it.

info

Prior to Terraform v0.13, our convention was to use the pure git url:

source = "git::https://github.com/cloudposse/terraform-null-label.git?ref=tags/0.16.0"

Note that the ref always points explicitly to a tags pinned to a specific version. Dropping the tags/ qualifier means it could be a branch or a tag; we prefer to be explicit.