Terraform Best Practices
These are the opinionated best-practices we follow at Cloud Posse. They are inspired by years of experience writing terraform and borrow on the many other helpful resources like those by HashiCorp.
See our general Best Practices which also apply to Terraform.
Variables
Use upstream module or provider variable names where applicable
When writing a module that accepts variable
inputs, make sure to use the same names as the upstream to avoid confusion and ambiguity.
Use all lower-case with underscores as separators
Avoid introducing any other syntaxes commonly found in other languages such as CamelCase or pascalCase. For consistency we want all variables to look uniform. This is also inline with the HashiCorp naming conventions.
Use positive variable names to avoid double negatives
All variable
inputs that enable/disable a setting should be formatted ...._enabled
(e.g. encryption_enabled
). It is acceptable for
default values to be either false
or true
.
Use description field for all inputs
All variable
inputs need a description
field. When the field is provided by an upstream provider (e.g. terraform-aws-provider
), use same wording as the upstream docs.
Use sane defaults where applicable
Modules should be as turnkey as possible. The default
value should ensure the most secure configuration (E.g. with encryption enabled).
Use nullable = false
where appropriate
When passing an argument to a resource, passing null
means to use the default
value. Prior to Terraform version 1.1.0, passing null
to a module input
set that value to null
rather than to the default value.
Starting with Terraform version 1.1.0, variables can be declared as
nullable = false
which:
- Prevents the variable from being set to
null
. - Causes the variable to be set to the default value if
null
is passed in.
You should always use nullable = false
for all variables which should never
be set to null
. This is particularly important for lists,
maps, and objects, which, if not required, should default to empty values (i.
e. {}
or []
) rather than null
. It can also be useful to set strings
to default to ""
rather than null
and set nullable = false
. This will
simplify the code since it can count on the variable having a non-null value.
The default nullable = true
never needs to be explicitly set. Leave
variables with the default nullable = true
if a null value is acceptable.
Use feature flags, list, or map inputs for optional functionality
All Cloud Posse modules should respect the null-label
enabled
feature flag, and when enabled
is false
, create no resources
and generate null outputs (or, in the case of output lists, maps, and objects,
empty values may be acceptable to avoid having other modules consuming the
outputs fail due to having a null rather than empty value).
Optional functionality should be toggled in either of 2 ways:
- Use of a feature flag. Specifically, an input variable of type
bool
with a name ending in_enabled
. Use this mechanism if the option requires no further configuration, e.g.iam_role_enabled
ors3_bucket_enabled
. Feature flags should always benullable = false
, but the default value can betrue
orfalse
as desired. - If an optional feature requires further configuration, use a
list
ormap
input variable, with an empty input disabling the option and non-empty input providing configuration. In this case, only use a separate feature flag if the list or map input may still cause problems due to relying on computed problems during the plan phase. See Count vs. For Each and Terraform Errors When Planning for more information. - It is never acceptable for an optional feature of the module to be toggled by the value of a string or number input variable, due to the issues explained in Terraform Errors When Planning. However, if an optional feature of a resource may be toggled by such an input if that is the behavior of the resource and the input has the same name as the resource argument.
Use objects with optional fields for complex inputs
When a module requires a complex input, use an object with optional fields.
This provides documentation and plan-time validation while avoiding type
conversion errors, and allows for future expansion without breaking changes.
Make as many fields as possible optional, provide defaults at every level of
nesting, and use nullable = false
if possible.
If you use an object with defaults as an input, Terraform will not give any
indication if the user provides extra fields in the input object. This is
particularly a problem if they misspelled an optional field name, because
the misspelled field will be silently ignored, and the default value the
user intended to override will silently be used. This is
a limitation of Terraform.
Furthermore, there is no way to add any checks for this situation, because
the input will have already been transformed (unexpected fields removed) by
the time any validation code runs. This makes using an object a trade-off
versus using separate inputs, which do not have this problem, or type = any
which allows you to write validation code to catch this problem and
additional code to supply defaults for missing fields.
Reserve type = any
for exceptional cases where the input is highly
variable and/or complex, and the module is designed to handle it. For
example, the configuration of a Datadog synthetic test
is both highly complex and the Cloud Posse module
accepts both an object derived from the synthetics_test
resource schema or
an object derived from the JSON output of the Datadog API. In this rare case,
attempting to maintain a type definition would not only be overly complex,
it would slow the adoption of future additions to the interface, and so
type = any
is appropriate.
Prefer a single object over multiple simple inputs for related configuration
When reviewing Cloud Posse modules as examples, you may notice that they often use a large number of input variables of simple types. This is because in the early development of Terraform, there was no good way to define complex objects with defaults. However, now that Terraform supports complex objects with field-level defaults, we recommend using a single object input variable with such defaults to group related configuration, taking into consideration the trade-offs listed in the above caution. This makes the interface easier to understand and use.
For example, prefer:
variable "eip_timeouts" {
type = object({
create = optional(string)
update = optional(string)
delete = optional(string, "30m")
}))
default = {}
nullable = false
}
rather than:
variable "eip_create_timeout" {
type = string
default = null
}
variable "eip_update_timeout" {
type = string
default = null
}
variable "eip_delete_timeout" {
type = string
default = "30m"
}
However, using an object with defaults versus multiple simple inputs is not without trade-offs, as explained in the above caution.
There are a few ways to mitigate this problem besides using separate inputs:
- If all the defaults are null or empty, you can use a
map(string)
input variable and use thekeys
function to check for unexpected fields. This catches errors, but has the drawback that it does not provide documentation of what fields are expected. - You can use
type = any
for inputs, but then you have to write the extra code to validate the input and supply defaults for missing fields. You should also document the expected fields in the input description. - If all you are worried about is misspelled field names, you can make the
correctly spelled field names required, ensuring they are supplied.
Alternatively, if the misspelling is predictable, such as you have a field
named
minsize
but people are likely to try to supplymin_size
, you can make the misspelled field name optional with a sentinel value and then check for that value in the validation code.
Use custom validators to enforce custom constraints
Use the validation
block to enforce custom constraints on input variables.
A custom constraint is one that, if violated, would not otherwise
cause an error, but would cause the module to behave in an unexpected way.
For example, if the module takes an optional IPv6 CIDR block, you might
receive that in a variable of list(string)
for reasons explained
here.
Use a custom validator to ensure that the list has at most one element, because
if it has more than one, the module will ignore the extra elements, while a
reasonable person might expect them to be used in some way. At the same time, is it not necessary to use a custom validator to enforce
that the input is a valid IPv6 CIDR block, because Terraform will already
do that for you.
This is perhaps better illustrated by an example of a pseudo-enumeration.
Terraform does not support real enumerations, so they are typically
implemented as strings with a limited number of acceptable values. For example,
Say you have a resource that takes a frequency
input that is a string that
must be either DAILY
or WEEKLY
. Even though the value of the string is very
restricted, you should not use a custom validator to enforce that for two reasons.
- Terraform (technically, the Terraform resource provider) will already enforce that the string is one of those two values and produce an informative error message if it is not.
- If you use a custom validator to enforce that the string is one of those two values, and then a later version of the resource adds a new option to the enumeration, such as "HOURLY", the custom validator will prevent the module from using the new value, even though the module woudl function perfectly were it not for the validator. This adds work and delay to the adoption of underlying enhancements to the resource, without providing enough benefit to be worth the extra effort.
Use variables for all secrets with no default
value and mark them "sensitive"
All variable
inputs for secrets must never define a default
value. This ensures that terraform
is able to validate user input.
The exception to this is if the secret is optional and will be generated for the user automatically when left null
or ""
(empty).
Use sensitive = true
to mark all secret variables as sensitive. This ensures that the value is not printed to the console.
Outputs
Use description field for all outputs
All outputs must have a description
set. The description
should be based on (or adapted from) the upstream terraform provider where applicable.
Avoid simply repeating the variable name as the output description
.
Use well-formatted snake case output names
Avoid introducing any other syntaxes commonly found in other languages such as CamelCase or pascalCase. For consistency, we want all variables
to look uniform. It also makes code more consistent when using outputs together with terraform remote_state
to access those settings from across modules.
Never output secrets
Secrets should never be outputs of modules. Rather, they should be written to secure storage such as AWS Secrets Manager, AWS SSM Parameter Store with KMS encryption, or S3 with KMS encryption at rest. Our preferred mechanism on AWS is using SSM Parameter Store. Values written to SSM are easily retrieved by other terraform modules, or even on the command-line using tools like chamber by Segment.io.
We are very strict about this in our components (a.k.a root modules), the
top-most module, because these sensitive outputs are easily leaked in CI/CD
pipelines (see tfmask
for masking
secrets in output only as a last resort). We are less sensitive to this in
modules that are typically nested inside of other modules.
Rather than outputting a secret, you may output plain text indicating where
the secret is stored, for example RDS master password is in SSM parameter /rds/master_password
. You may also want to have another output just for the
key for the secret in the secret store, so the key is available to other
programs which may be able to retrieve the value given the key.
Regardless of whether the secret is output or not, the fact that a secret is known to Terraform means that its value is stored in plaintext in the Terraform state file. Storing values in SSM Parameter Store or other places does not solve this problem. Even if you store the value in a secure place using some method other than Terraform, if you read it into Terraform, it will be stored in plaintext in the state file.
To keep the secret out of the state file, you must both store and retrieve the secret outside of Terraform. This is a limitation of Terraform that has been discussed practically since Terraform's inception, so a near-term solution is unlikely.
Use symmetrical names
We prefer to keep terraform outputs symmetrical as much as possible with the upstream resource or module, with exception of prefixes. This reduces the amount of entropy in the code or possible ambiguity, while increasing consistency. Below is an example of what *not to do. The expected output name is user_secret_access_key
. This is because the other IAM user outputs in the upstream module are prefixed with user_
, and then we should borrow the upstream's output name of secret_access_key
to become user_secret_access_key
for consistency.
Language
Use indented HEREDOC
syntax
Using <<-EOT
(as opposed to <<EOT
without the -
) ensures the code can be indented inline with the other code in the project.
Note that EOT
can be any uppercase string (e.g. CONFIG_FILE
)
block {
value = <<-EOT
hello
world
EOT
}
Do not use HEREDOC
for JSON, YAML or IAM Policies
There are better ways to achieve the same outcome using terraform interpolations or resources
For JSON, use a combination of a local
and the jsonencode
function.
For YAML, use a combination of a local
and the yamlencode
function.
For IAM Policy Documents, use the native iam_policy_document
resource.
Do not use long HEREDOC
configurations
Use instead the templatefile
function and move the
configuration to a separate template file.
Use terraform linting
Linting helps to ensure a consistent code formatting, improves code quality and catches common errors with syntax.
Run terraform fmt
before committing all code. Use a pre-commit
hook to do this automatically. See Terraform Tips & Tricks.
Consider using tflint
with
the aws
plugin.
Use CIDR math interpolation functions for network calculations
This reduces the barrier to entry for others to contribute and reduces likelihood of human error. We have a number of terraform modules as well to aide in the subnet calculations since there are multiple strategies depending on the use-case.
- https://github.com/cloudposse/terraform-aws-dynamic-subnets
- https://github.com/cloudposse/terraform-aws-multi-az-subnets
- https://github.com/cloudposse/terraform-aws-named-subnets
Read more: https://www.terraform.io/docs/configuration/interpolation.html#cidrsubnet-iprange-newbits-netnum-
Use .editorconfig
in all repos for consistent whitespace
Every mainstream IDE supports plugins for the .editorconfig
standard to make it easier to enforce whitespace consistency.
We recommend adopting the whitespace convention of a particular language or project.
This is the standard .editorconfig
that we use.
[*]
indent_style = space
indent_size = 2
trim_trailing_whitespace = true
## Override for Makefile
[{Makefile, makefile, GNUmakefile, Makefile.*}]
indent_style = tab
indent_size = 4
[*.sh]
indent_style = tab
Use minimum version pinning on all providers
Terraform's providers are constantly in flux. It is hard to know if the module you write today will work with older versions of the provider APIs, and usually not worth the effort to find out. For that reason we want to advertise the minimum version of the provider we have tested with.
With regard to reusable modules (as opposed to components, a.k.a. root modules), while it is possible that future Terraform or provider versions may introduce breaking changes, such changes, should they occur, can be mitigated by setting maximum version pins in root modules. Furthermore, we cannot test updated providers to find out about problems if we have placed upper limits on versions. Therefore, for all Cloud Posse reusable modules, we only place lower limits on versions.
In your root modules, you may want to include upper limits or pin to exact versions to avoid surprises. It is a trade-off between stability and ease of staying current you will have to evaluate for your own situation. You may prefer to use Terraform's lock files to pin to exact versions, while leaving the coded version requirements as minimums.
Use locals
to identify and describe opaque values
Using locals
makes code more descriptive and maintainable. Rather than using complex expressions as parameters to some terraform resource,
instead move that expression to a local
and reference the local
in the resource.
State
Use remote state
Use Terraform to create state bucket
This requires a two-phased approach, whereby you first provision the bucket without the remote state enabled. Then enable remote state (e.g. s3 {}
) and import remote state by simply rerunning terraform init
. We recommend this strategy because it promotes using the best tool for the job and makes it easier to define requirements and use consistent tooling.
Using the terraform-aws-tfstate-backend
module it is easy to provision state buckets.
Use backend with support for state locking
We recommend using the S3 backend with DynamoDB for state locking.
Pro Tip: Using the terraform-aws-tfstate-backend
this can be easily implemented.
Use terraform
or atmos
CLI to set backend parameters
Promote the reusability of a root module across accounts by avoiding hardcoded
backend requirements. Instead, use the Terraform CLI or atmos
to set the
current context.
terraform {
required_version = ">= 1.0.0"
backend "s3" {}
}
Future versions of Terraform are usually at least mostly compatible of previous versions, and we want to be able to test the modules with future versions to find out. Therefore, do not place an upper limit on the required_version
like ~>0.12.26
or >= 0.13, < 0.15
. Always use >=
and enforce Terraform versions in your environment by controlling which CLI you use.
Use encrypted S3 bucket with versioning, encryption and strict IAM policies
We recommend not commingling state in the same bucket as other data. This could cause the state to get overridden or compromised. Note, the state contains cached values of all outputs. Consider isolating sensitive areas like production configuration and audit trails (separate buckets, separate organizations).
Pro Tip: Using the terraform-aws-tfstate-backend
to easily provision buckets for each stage.
Use Versioning on State Bucket
Use Encryption at Rest on State Bucket
Use .gitignore
to exclude terraform state files, state directory backups and core dumps
.terraform
.terraform.tfstate.lock.info
*.tfstate
*.tfstate.backup
Use .dockerigore
to exclude terraform statefiles from builds
Example:
**/.terraform*
Use a programmatically consistent naming convention
All resource names (E.g. things provisioned on AWS) must follow a consistent convention. The reason this is so important is that modules are frequently composed inside of other modules. Enforcing consistency increases the likelihood that modules can invoke other modules without colliding on resource names.
To enforce consistency, we require that all modules use the terraform-null-label
module.
With this module, users have the ability to change the way resource names are generated such as by changing the order of parameters or the delimiter.
While the module is opinionated on the parameters, it's proved invaluable as a mechanism for generating consistent resource names.
Module Design
Small Opinionated Modules
We believe that modules should do one thing very well. But in order to do that, it requires being opinionated on the design. Simply wrapping terraform resources for the purposes of modularizing code is not that helpful. Implementing a specific use-case of those resource is more helpful.
Composable Modules
Write all modules to be easily composable into other modules. This is how we're able to achieve economies of scale and stop re-inventing the same patterns over and over again.
Module Usage
Use Terraform registry format with exact version numbers
There are many ways to express a module's source. Our convention is to use Terraform registry syntax with an explicit version.
source = "cloudposse/label/null"
version = "0.25.0"
The reason to pin to an explicit version rather than a range like >= 0.25. 0
is that any update is capable of breaking something. Any changes to your infrastructure should be implemented and reviewed under your control, not blindly automatic based on when you deployed it.
Prior to Terraform v0.13, our convention was to use the pure git url:
source = "git::https://github.com/cloudposse/terraform-null-label.git?ref=tags/0.16.0"
Note that the ref
always points explicitly to a tags
pinned to a specific version. Dropping the tags/
qualifier means it could be a branch or a tag; we prefer to be explicit.