ECS Partial Task Definitions

This document describes what partial task definitions are and how we can use them to set up ECS services using Terraform and GitHub Actions.

The Problem

Managing ECS Services is challenging. Ideally, we want our services to be managed by Terraform so everything is living in code. However, we also want to update the task definition via GitOps as through the GitHub release lifecycle. This is challenging because Terraform can create the task definition, but if updated by the application repository, the task definition will be out of sync with the Terraform state.

Managing it entirely through Terraform means we cannot easily update the newly built image by the application repository unless we directly commit to the infrastructure repository, which is not ideal.

Managing it entirely through the application repository means we cannot codify the infrastructure and have to hardcode ARNs, secrets, and other infrastructure-specific configurations.

Introduction

ECS Partial task definitions is the idea of breaking the task definition into smaller parts. This allows for easier management of the task definition and makes it easier to update the task definition.

We do this by setting up Terraform to manage a portion of the task definition, and the application repository to manage another portion.

The Terraform (infrastructure) portion is created first. It will create an ECS Service in ECS, and then upload the task definition JSON to S3 as task-template.json.The application repository will have a task-definition.json git controlled, during the development lifecycle, the application repository will download the task definition from S3, merge the task definitions, then update the ECS Service with the new task definition. Finally, GitHub actions will update the S3 bucket with the deployed task definition under task-definition.json. If Terraform is planned again, it will use the new task definition as the base for the next deployment, thus not resetting the image or application configuration.

Pros

The benefit to using this approach is that we can manage the task definition portion in Terraform with the infrastructure, meaning secrets, volumes, and other ARNs can be managed in Terraform. If a filesystem ID updates we can re-apply Terraform to update the task definition with the new filesystem ID. The application repository can manage the container definitions, environment variables, and other application-specific configurations. This allows developers who are closer to the application to quickly update the environment variables or other configuration.

Cons

The drawback to this approach is that it is more complex than managing the task definition entirely in Terraform or the application repository. It requires more setup and more moving parts. It can be confusing for a developer who is not familiar with the setup to understand how the task definition is being managed and deployed.

This also means that when something goes wrong, it becomes harder to troubleshoot as there are more moving parts.

Getting Setup

Pre-requisites

Application Repository - Cloud Posse Example ECS Application
Infrastructure Repository
ECS Cluster - Cloud Posse Docs - Component.
ecs-service - Cloud Posse Docs - Component.
- Must use the Cloud Posse Component.
- v1.416.0 or later.
S3 Bucket - Cloud Posse Docs - Component.

Steps

Set up the S3 Bucket that will store the task definition.

This bucket should be in the same account as the ECS Cluster.

S3 Bucket Default Definition

components:
  terraform:
    s3-bucket/defaults:
      metadata:
        type: abstract
      vars:
        enabled: true
        account_map_tenant_name: core
        # Suggested configuration for all buckets
        user_enabled: false
        acl: "private"
        grants: null
        force_destroy: false
        versioning_enabled: false
        allow_encrypted_uploads_only: true
        block_public_acls: true
        block_public_policy: true
        ignore_public_acls: true
        restrict_public_buckets: true
        allow_ssl_requests_only: true
        lifecycle_configuration_rules:
          - id: default
            enabled: true
            abort_incomplete_multipart_upload_days: 90
            filter_and:
              prefix: ""
              tags: {}
            # Move to Glacier after 2 years
            transition:
              - storage_class: GLACIER
                days: 730
            # Never expire
            expiration: {}
            # Versioning isn't enabled, but these default values are still required
            noncurrent_version_transition:
              - storage_class: GLACIER
                days: 90
            noncurrent_version_expiration: {}

import:
  - catalog/s3-bucket/defaults

components:
  terraform:
    s3-bucket/ecs-tasks-mirror: #NOTE this is the component instance name.
      metadata:
        component: s3-bucket
        inherits:
          - s3-bucket/defaults
      vars:
        enabled: true
        name: ecs-tasks-mirror

Create an ECS Service in Terraform

Set up the ECS Service in Terraform using the ecs-service component. This will create the ECS Service and upload the task definition to the S3 bucket.

To enable Partial Task Definitions, set the variable s3_mirror_name to be the component instance name of the bucket to mirror to. For example s3-bucket/ecs-tasks-mirror
```
components:
  terraform:
    ecs-services/defaults:
      metadata:
        component: ecs-service
        type: abstract
      vars:
        enabled: true
        ecs_cluster_name: "ecs/cluster"
        s3_mirror_name: s3-bucket/ecs-tasks-mirror
```
Set up an Application repository with GitHub workflows.

An example application repository can be found here.

Two things need to be pulled from this repository:
- The task-definition.json file under deploy/task-definition.json
- The GitHub Workflows.
An important note about the GitHub Workflows, in the example repository they all live under .github/workflows. This is done so development of workflows can be fast, however we recommend moving the shared workflows to a separate repository and calling them from the application repository. The application repository should only contain the workflows main-branch.yaml, release.yaml and feature-branch.yml.

To enable Partial Task Definitions in the workflows, the call to cloudposse/github-action-run-ecspresso (link) should have the input mirror_to_s3_bucket set to the S3 bucket name. the variable use_partial_taskdefinition should be set to 'true'
Example GitHub Action Step
- name: Deploy uses: cloudposse/github-action-deploy-[email protected] continue-on-error: true if: ${{ steps.db_migrate.outcome != 'failure' }} id: deploy with: image: ${{ steps.image.outputs.out }} image-tag: ${{ inputs.tag }} region: ${{ steps.environment.outputs.region }} operation: deploy debug: false cluster: ${{ steps.environment.outputs.cluster }} application: ${{ steps.environment.outputs.name }} taskdef-path: ${{ inputs.path }} mirror_to_s3_bucket: ${{ steps.environment.outputs.s3-bucket }} use_partial_taskdefinition: "true" timeout: 10m

Operation

Changes through Terraform will not immediately be reflected in the ECS Service. This is because the task template has been updated, but whatever was in the task-definition.json file in the S3 bucket will be used for deployment.

To update the ECS Service after updating the Terraform for it, you must deploy through GitHub Actions. This will then download the new template and create a new updated task-definition.json to store in s3.

The Problem​

Introduction​

Pros​

Cons​

Getting Setup​

Pre-requisites​

Steps​

Operation​