Proposed: Use More Flexible Resource Labels

Date: 19 Apr 2022

Needs Update!

The content in this ADR may be out-of-date and needing an update. For questions, please reach out to Cloud Posse

No pushback from the team. Overall, we know we need to support arbitrary label fields and don't like how we use environment to represent region. Note, this suggestion also matches (is consistent with) our filesystem organization: <namespace>/<tenant>/<stage>/<environment>. Internal discussion reference
Decision is to adopt: <namespace>-<tenant>-<stage>-<environment>

Status

PROPOSAL

Problem

Currently, we use a fixed set of labels, dictated by the terraform-null-label component, for labeling everything provisioned by IoC. This set of labels is also treated specially by atmos and includes labeling IAM roles and both atmos and Spacelift “stacks”.

The choice of label names has proven to be confusing and unpopular.
The set of labels is fixed. When we added “tenant” as a possible label it was a major undertaking to upgrade terraform-null-label to handle it.
Because the label names are fixed, and atmos does not have access to the outputs of terraform-null-label (because atmos is written in go and not Terraform), adding or changing label names requires code changes to both terraform-null-label and atmos
The use of name as a label name is a particular problem as it conflicts with AWS' usage of the tag key “Name” as the UI display name of a resource.
We have come to rely on atmos as a tool, and it needs to parse labels to determine the Atmos “stack” name, the Terraform backend configuration, the Terraform workspace name, the EKS cluster name, and possibly other resources, but atmos is written in go and cannot use terraform-null-label, which is a Terraform module, to generate these items, but nevertheless we want some of them to be available in Terraform so that components can access configuration data generated by other Terraform components.
We have some components, such as Kubernetes deployments, that have additional configuration labels/variants, such as color for blue/green deployments or ip for IPv4/IPv6 variants. We would like to be able to flexibly use or not use these additional labels to distinguish deployed deployments where applicable, without requiring them for other components (e.g. cloudtrail_bucket) where they are not needed. Currently we are doing this by manually altering the component names to include the variant labels, but this practice is not DRY and eliminates many of the advantages atmos gives us through importing configurations, since all configurations are, in the end, tied to a component name. The proper Atmos model is to have a single component name with variable Terraform workspaces selected by variable labels.

Context

Early on, Cloud Posse decided that consistent labeling was important and implemented a mechanism for it in the form of terraform-null-label. (terraform-null-label, or null-label for short, was first released in 2017.) At the time it was first released, Terraform itself was in the early stages of development and lacked many essential features, so the capabilities of the module were limited. In particular, there was no way to iterate over lists or maps. This imposed a practical requirement that inputs to null-label be known in advance (hardcoded).

The original set of labels was:

namespace
stage
name

Over time, we added

environment
tenant

...to get to the current set of 5 labels. (null-label also accepts a list of attributes and a map of tags, which are outside the scope of this ADR.)

Unfortunately, except for the tenant, there are issues with all of these label names.

namespace collides with Kubernetes' use of “namespace” as a mechanism for isolating groups of resources within a single cluster, and we have had problems due to the $NAMESPACE shell variable being set to indicate our version of “namespace” while being interpreted by some tools as Kubernetes' version.
environment is not bad, but a lot of people use it in a way we do not use it. We use it as a region code (abbreviation for a particular AWS Region) while most people use it to indicate a functional role or AWS account, such as “production” or “staging”.
stage is a bit confusing, and in the end more generic than we allow. We use it the way many people use “environment”, but because we typically have a 1-to-1 mapping of stage to AWS Account, our code frequently assumes that “stage” is the same as “account”. This breaks, however, in multi-tenant environments where tenants have multiple accounts, such as tenant-dev, tenant-stage and tenant-production.
name is a problem in that AWS reserves that for the tag key whose value is displayed in the web UI. For all our other labels, we add a tag with the (capitalized) label name as tag key and (normalized) label value as the tag value. We make an exception for “Name”, setting that value to the the id (the fully formed identifier combining all the labels), not the value of the name label, which confuses everyone.
Atmos separately has (in atmos.yaml) configuration for helm_aws_profile_pattern, EKS cluster_name_pattern, and Stack name_pattern, along with separate configuration for Component name (directory) and Terraform workspace name. Currently these are either completely hard coded (Component name) or are configured using a template based on the above listed special label names, which works completely separately from null-label and must be kept in sync.

Now (April 2022), Terraform version 1.1 has several features that enable us to use an arbitrary set of label names. On the drawing board (but for no earlier than Terraform version 1.3) is also an additional feature we would like, allowing input objects to have optional attributes. This suggests we can create a new null-label version with 1.1 features and again enhance it after optional attributes have been released. https://github.com/hashicorp/terraform/pull/31154

Considered Options

Option 1:

Null Label

Going forward, I suggest Cloud Posse use different label names in its engagements:

company instead of namespace, to provide a global prefix that makes the final ID unique despite our reuse of all the other label values
region_code or reg instead of environment to indicate the abbreviated AWS Region
tenant can remain, or be changed to ou for organizational unit.
env instead of stage, to indicate the function of the environment, such as “development”, “sandbox”, or “production”. In environments where env always equals account. We would specify only one and have the other be a generated label (see below). Which one to specify should be based on a survey of clients' preferences.
account instead of stage to indicate the name of the AWS account. account would never be specified directly, it would generally be either env or tenant-env.
component_name instead of name (and to avoid overloading name used by AWS and component which has special meaning to atmos).
Possibly an additional label component, such as net or ip that can be used to allow us to create IPv4 and IPv6 versions of components like EKS clusters or ALBs in the same account and region and yet still distinguish them. It label component would ideally have an optional attribute that removes the delimiter before it, so if name is eks and ip is 6, we can get a name like {namespace}-{tenant}-{environment}-{stage}-eks6-cluster instead of {namespace}-{tenant}-{environment}-{stage}-eks-6-cluster

To facilitate this, I suggest an overhaul of terraform-null-label. We can use the existing label_order input to take an arbitrary list of label names. We can deprecate the existing hard-coded label names in favor of a new input, called label_input (to allow us to have an output named labels which has the normalized label values, and a separate output named label_input which preserves the input untransformed) or labels (where either we do not care about the output labels being different than the input or we are satisfied that module.this.labels is normalized while module.this.context.labels gets you back exactly what was input, as is currently the case with the special label names., e.g module.this.stage vs modules.this.context.stage) which is a map(string) where the keys are label names and the values are label values. (This is exactly like the tags input, but the tags are not altered, while labels are.)

Additionally, we deprecate the existing descriptor_format input and descriptors output in favor of a label_generator input which adds labels to the labels output. This would allow us to have an account output that by default is the same as the env or stage output (and for that matter, allow us to preserve the namespace, environment, and name outputs even though we have stopped using them as inputs), and also handle the case where account is a composite of 2 labels like tenant-dev.

Future Possibilities

Once Terraform supports optional object members, I would propose label_generator be a map(object) that has:

key is name of label to generate
labels = list(string) list of label to construct the label from, in order
delimiter = optional(string) the delimiter to use when joining the labels, defaults to label delimiter
value_case = optional(string) the case formatting of the label values, one of lower, title, upper or none (no transformation), defaults to label_value_case
regex_remove_chars = optional(string) regex specifying characters to remove from the value, defaults to top level regex_replace_chars (which I would deprecate and replace with regex_remove_chars since we do not provide the capability to replace the characters and no one has asked for that).
length_limit = optional(number) the limit on the length of the value, or 0 for unlimited, defaults to 0.
truncation_mode = optional(string) one of "beginning", "middle", or "end". Where to place the hash that substitutes for the extra characters in the label. Allows you to decide to truncate foo-bar-baz as foo-bar-<hash> (the only mode we allow today), <hash>-bar-baz, or foo-<hash>-baz. I would also add id_truncation_mode to the top-level and default truncation_mode to whatever id_truncation_mode is set to. Unfortunately, id_truncation_mode would need to default to end for backward compatibility, but I think middle is the better default.

locals {
  # Create a default format map so it can be reused, optionally with changes applied.
  # This is in part to deal with the Terraform requirement that all values of a map
  # must have the exact same type.
  default_format = {
    delimiter = "-"
    value_case = "lower"
    regex_remove_chars = "/[^a-zA-Z0-9-]/"
    length_limit = 64
    truncation_mode = "middle"
  }
}

# Advanced example, more like what we would probably use
module "this" {
  source = "cloudposse/label/null"

  label_order = [ "org", "ou", "reg", "env", "component"]
  label_format = local.default_format

  label_generator = {
    # This is how we would generate the "id" output if it were not hardcoded for backward compatibility
    id = merge(local.default_format, {
      labels = [ "org", "ou", "reg", "env", "component"]
    })

    # Generate an output named "account" of the form "${ou}_${env}"
    account = merge(local.default_format, {
      # Specify the value inputs and the order
      labels = ["ou", "env"]
      # Change the delimiter to "_" instead of "-"
      delimiter = "_"
      # By default, we remove underscores, so we need to alter the list of characters to remove
      regex_remove_chars = "/[^a-zA-Z0-9-_]/"
    })
  }

  # In practice, the "values" input would be generated by Atmos
  #   For example, in stacks/orgs/cplive/_defaults.yaml
  #   vars:
  #     label_values:
  #       org: cplive
  label_values = merge ({component = var.component_name} , {
    org = "cplive",
    ou  = "plat",
    reg = "ue1"
  })
}

locals {
  id  = module.this.id
  org = module.this.labels["org"]

  account_name = module.this.labels["account"]
}

# Simpler example
module "this" {
  source = "cloudposse/label/null"

  label_order = [ "org", "ou", "reg", "env", "component"]
  label_format = local.default_format

  label_generator = {
    account = {
      labels = ["ou", "env"]
      delimiter = "_"
      regex_remove_chars = "/[^a-zA-Z0-9-_]/"
    }
  }

  label_values = {
    org = "cplive",
    ou  = "plat",
    reg = "ue1"
  }
}

# Simplest example
module "this" {
  source = "cloudposse/label/null"

  label_order = [ "org", "ou", "reg", "env", "component"]
  format = local.default_format

  values = {
    org = "cplive",
    ou  = "plat",
    reg = "ue1"
  }
}

# In stacks/orgs/cplive/_defaults.yaml using current labels
# (Compare to https://github.com/cloudposse/infra-live/blob/8754dc3d1e938c31387bc704ef361fc476fe28e5/stacks/orgs/cplive/_defaults.yaml#L9-L28 )
vars:
  label_values:
    namespace: cplive
  label_order:
  - namespace
  - tenant
  - environment
  - stage
  - name
  - attributes
  label_format: &default_label_format
    delimiter: "-"
    value_case: "lower"
    regex_remove_chars: "/[^a-zA-Z0-9-]/"
    length_limit: 64
    truncation_mode: "middle"
  label_generator:
    account_name:
      <<: *default_label_format
      labels:
      - tenant
      - stage
    stack:
      <<: *default_label_format
      labels:
      - tenant
      - environment
      - stage

# In stacks/orgs/cplive/core/_defaults.yaml
vars:
  label_values:
    tenant: cplive
# et cetera

For now (April 2022) with no ETA on that feature, I would limit label_generators to map(list(string)):

key is name of label to generate
labels = list(string) list of label to construct the label from, in order

The generated label will be the normalized values of the labels named in the list, in that order, joined by the same delimiter used for the id.

Likewise, we would deprecate the named outputs (and descriptors) in favor of a labels output which is a map of label names to normalized label outputs. So instead of module.this.stage we would reference modules.list.labels["stage"]

Atmos Changes

We need to update atmos to support a flexible set of labels.

Atmos option 1

Instead of specifying a template for each configuration value, such as cluster_name_pattern, Atmos could configure a labels output to use as cluster_name_pattern (e.g. cluster) and then both atmos and terraform will have access to exactly the same information in the same way (e.g. module.this.labels["cluster"]).

Atmos option 2

Right now, there are the top level namespace, stage, name, tenant, environment labels.

We could put these now under a new section in the stacks or in atmos.yaml:

terraform:
  backend:
    backend_pattern: {foo}-{bar}-{baz}
labels:
  - foo
  - bar
  - baz

For compatibility with null-label, atmos should populate the labels based on the fully merged vars section of the stack configuration, supporting both the old variables as it does now and the new label_input (or whatever we call it) map.

Option 2:

module "this" {
  label = "camelcase(id)-lowercase(name)-uppercase(company)" # camelcaseHyphenFoobarFormat(....)
  context = var.context
}

Option 3:

We predefine a named set of formats and allow additional custom formats to be defined

# Simpler example
module "this" {
  source = "cloudposse/label/null"

  label_order = [ "org", "ou", "reg", "env", "component"]
  label_format = "kebab"

  label_generator = {
    account = {
      labels = ["ou", "env"]
      format = "snake"
    }
  }

  label_values = {
    org = "cplive",
    ou  = "plat",
    reg = "ue1"
  }
}

Decision

DECIDED:

Proposed: Use More Flexible Resource Labels

Status

Problem

Context

Considered Options

Option 1:

Null Label

Future Possibilities

Atmos Changes

Atmos option 1

Atmos option 2

Option 2:

Option 3:

Decision

Consequences

References

Status​

Problem​

Context​

Considered Options​

Option 1:​

Null Label​

Future Possibilities​

Atmos Changes​

Atmos option 1​

Atmos option 2​

Option 2:​

Option 3:​

Decision​

Consequences​

References​

Status

Problem

Context

Considered Options

Option 1:

Null Label

Future Possibilities

Atmos Changes

Atmos option 1

Atmos option 2

Option 2:

Option 3:

Decision

Consequences

References