Proposed: Use GitHub Actions with Atmos
Date: 14 Apr 2022
The content in this ADR may be out-of-date and needing an update. For questions, please reach out to Cloud Posse
- The proposal has already been adopted, and this ADR needs to be updated to reflect the final decision.
Status
DRAFT
Problem
Smaller, bootstrappy startups don’t have the budget for Spacelift. There are things in atmos that we can do (e.g. workflows) that are easier to implement in conventional manners than the rego-based approach in Spacelift.
Context
Considered Options
-
Update atmos to natively support
git diff
to strategically determine what changed in a branch HEAD commit relative to any other BASE commit for the purpose of strategic plan/apply of YAML stack configurations -
For our GitHub Action, we’ll want to detect the last successful commit applied against the default branch. This can be accomplished by querying the GitHub API.
-
Provision private, restricted S3 bucket to store terraform
planfiles
(or use dynamodb? to facilitate locking, and discovery of the latest planfile and invalidating old planfiles) -
Use lifecycle rules to expunge old plans
-
Institute branch protections with
CODEOWNERS
-
Implement support for manual deployment approvals https://docs.github.com/en/enterprise-cloud@latest/actions/managing-workflow-runs/reviewing-deployments
-
Implement environment protection rules https://docs.github.com/en/enterprise-cloud@latest/actions/deployment/targeting-different-environments/using-environments-for-deployment#environment-protection-rules
-
Create private, shared GitHub Actions workflows for:
-
atmos terraform plan
-
Develop a github action that will use a service account role for the runner (or OIDC) to run a terraform plan
-
Store the planfile in S3 (or dynamo)
-
Comment on the PR with a pretty depiction of what will happen for every affected stack
-
Upon merge to main, trigger a GitHub
deployment
with approval step See https://github.com/suzuki-shunsuke/tfcmt for inspiration
-
Include support for
terraform plan -destroy
for deleted stacks or components -
atmos terraform apply
-
Upon approval, trigger an “apply” by pulling down the corresponding “planfile” artifact from S3; they may be more than one planfile; abort if no planfile
-
Run atmos terraform apply on the planfiles in the appropriate order
-
Discard planfiles upon completion
-
Workflows for complex, coordinated sequences of operations (e.g. to bring up a full stack, one component at time)
-
Conflict resolution
-
Locking strategy for components / planfiles. How do we determine the latest planfile?
-
Implement a GitHub Action that when used together with branch protections prevents merging of pull requests if other unconfirmed changes are pending deployment and affect the same stacks.
-
This is the most complicated part of the solution. A “one cancels all” type of strategy will probably need to be implemented. We have to ensure planfiles are applied in sequential order, and any planfiles needs to be invalidated (or replanned) if upstream commits are made affecting the stacks
-
Drift Detection
-
Implement cron-based automatic replans of all infrastructure under management. https://docs.github.com/en/enterprise-cloud@latest/actions/using-workflows/workflow-syntax-for-github-actions#onschedule
-
Trigger webhook callbacks when drift is detected (e.g. escalate to Datadog)
Risks
- GitHub Actions does not provide a great dashboard overview of workflow runs. Mitigated by something like https://github.com/chriskinsman/github-action-dashboard
Mocking GitHub Action Workflows
Checks UI. Each job is a component. Each step is an environment.
name: "plan"
on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- stacks/*
- components/*
jobs:
atmos-plan:
runs-on: self-hosted-runner
steps:
- name: "Checkout source code at current commit"
uses: actions/checkout@v2
- name: Atmos do everything
runs: atmos plan do-everything
- name:
id: prepare
env:
LATEST_TAG_OS: 'alpine'
BASE_OS: ${{matrix.os}}
run: |
Decision
DECIDED: