Skip to main content

Decide on Argo CD Architecture

Deciding on the architecture for Argo CD involves considering multiple clusters, plugin management, and Kubernetes integration. We present some recommended strategies and considerations for deploying Argo CD, addressing potential risks, and detailing common deployment patterns.

Considerations

  • Multiple Argo CD clusters should be used to provide a means to systematically upgrade Argo CD in different environments.

  • Argo CD runs as a single pod, and requires disruptive restarts to add or upgrade plugins.

  • Restarts of Argo CD are disruptive to deployments.

  • The more Argo CD servers, the harder it is to visualize the delivery process.

  • Each Argo CD server must be integrated with each cluster it deploys to.

  • Argo CD can automatically deploy to the local cluster by installing a service account.

Our recommendation is to deploy one Argo CD per cluster.

Introduction

Argo CD is a tool designed specifically for continuous delivery to Kubernetes. It is similar to specialized platforms like Terraform Cloud, which focuses on deploying with Terraform. Argo CD does not support deployments outside of Kubernetes, such as uploading files to a bucket. While it does support plugins, these plugins are not intended to extend its deployment capabilities beyond Kubernetes.

Two forms of escape hatches exist for slight deviations such as deployments involving Kubernetes-adjacent tooling like Helm, Kustomize, etc.

  1. Using Argo CD Config Plugins that shell-out and generate kubernetes manifests

  2. Using the Operator Pattern to deploy Custom Resources that perform some operation

Risks

While the Operator Pattern is ideal in theory, the reality is less than ideal:

  • Operators are frequently abandoned, or not regularly maintained (most are lucky to achieve traction)

  • Most operators are in alpha state, seemingly created as pet-projects to scratch an itch

  • Upgrading operators is non-trivial because you cannot have 2 versions deployed at the same time

  • Operators don’t automatically play well with other operators. For example, how would you pass a secret written from ExternalSecrets operator to a Terraform

  • When Operators fail, it might not break the pipeline. Debugging is also harder due to its asynchronous nature.

Use-Cases

Here are some of the most common deployment patterns we see, and some ways in which those could be addressed:


  1. Deploy a generic application to Kubernetes

  2. Raw manifests are supported natively

  3. Render Helm charts to manifests, then proceed as usual.

  4. (Secrets and Secret Operators) Use ExternalSecrets Operator

  5. Deploy generic Lambda

  6. Convert to Serverless Framework

  7. Convert to Terraform

  8. Deploy Serverless Framework Applications

  9. Serverless applications render to Cloudformation. See Cloudformation.

  10. Wrap Cloudformation in a Custom Resource

  11. Deploy Infrastructure with Cloudformation

  12. https://github.com/linki/cloudformation-operator

  13. Deploy Single Page Application to S3

  14. This does not fit well into the

  15. Deploy Infrastructure with Terraform

  16. Most operators are alpha. https://github.com/hashicorp/terraform-k8s Something feels wrong about deploying kubernetes with Terraform and then running Terraform inside of Kubernetes?

  17. While it works, it’s a bit advanced to express.

  18. Deploy Database Migrations

  19. Replicated (enterprise application delivery platform) maintains schemahero. Only supports DDL. https://github.com/schemahero/schemahero

  20. Standard kubernetes Jobs calling migration tool

Pros

  • Simplify dependency management across components (eventually, Argo CD will redeploy everything)

  • Protect KubeAPI from requiring public access (reduce attack surface)

  • Powerful CD tool for Kubernetes supporting multiple rollout strategies for Pods

  • Nice UI

  • Easy to use many kinds of deployment toolchains (that run in argo cd docker image)

  • Feels like deployments are faster

  • “Backup kubernetes cluster” out of the box

  • Consistent framework for how to do continuous deployment regardless of CI platform

Cons

  • Breaks the immediate feedback loop from commit to deployment (as deployment with Argo CD are async)

  • Application CRD should be created in the namespace where argo cd is running

  • Applications name must be unique for Argo CD instance

  • Custom deployment toolchain (anything except kube resources/helm/kustomize/jsonnet) requires to build a custom docker image for argo cd and redeploy it.

  • Redeploying Argo CD is potentially a disruptive operation to running deployments (like restarting Jenkins) and therefore must be planned.

  • Updating plugins requires re-deploying Argo CD since the tools must exist in the Argo CD docker image

  • Access management has an additional level - github repo access + argo cd projects + rbac. We can have helm tiller type of problem

  • Additional self-hosted solution ( while classic deploy step with helm 3 runs on CI and use only kubectl )

  • Repository management (give access to private repos for argo cd) does not support declarative way (need research for ‘repo pattern’ workaround)

  • Argo CD is in the critical path of deployments and has it’s own SDLC.

Infrastructure

Create terraform-helm-argo cd module

  • Deploy Argo CD with Terraform so it will work well with Spacelift setup continuous delivery to Kubernetes

  • Use terraform-helm-provider

  • Use projects/terraform/argo cd/ (do not bundle with projects/terraform/eks/) In https://github.com/acme/infrastructure

  • Use spacelift to deploy/manage with GitOps

  • Use terraform-github-provider to create a deployment (e.g. deploy-prod) repository to manage Argo CD and manage all branch protections (confirm with acme)

Create GitHub PAT for Managing Repos

info

The scopes outlined in the provider documentation covers the entirety of the provider's capabilities. When applying the principal of least-privileged, this component only interacts with a few repositories and their settings, therefore the PAT generated for this specific use-case should be limited to those actions and should not contain overly permissive scopes such as admin:org.

The PAT needs to have the following permissions:

Scope(s)Purpose
- repoCreate repository; Manage repository’s settings; Modify repository’s contents
- read:org


- read:discussion
Used to validate teams when setting up branch protections and team access

Once generated, the PAT should be stored in an SSM parameter (by default at /argo cd/github/api_key) where it will be retrieved and passed to the integrations/github Terraform provider.

Create Repos for Deployment

We’ll use 3 repos “argo cd-deploy-prod” and “argo cd-deploy-non-prod”, and “argo cd-deploy-preview” for release apps.

Repos will be created manually due to GitHub organization permissions required.

Repos will be managed terraform configure branch protection, using a bot user that has admin permissions to the repos.

Repo structure

We’ll create 3 repos to start.

  • acme/argo cd-deploy-prod/

  • acme/argo cd-deploy-non-prod/

  • acme/argo cd-deploy-preview/ (for release apps)

By separating the preview environments from the other repos, we’re able to lifecycle it more frequently (e.g. resetting the commit history when it gets too large). We can also be more liberal about branch protections to enable continuous delivery without approvals.

Argo CD will only deploy from the main branch.

Here’s an example layout

RepositoryCluster
(region & stage)
Kubernetes NamespaceKubernetes Manifests
acme/argo cd-deploy-prd/uw2-prd/*argo cd/
prd/bazle-demo-frontend.yaml
bazle-demo-api.yaml
bazle-demo-db.yaml
(output from helmfile template)
uw2-tools/*argo cd/
github-action-runners/
uw2-auto/*
acme/argo cd-deploy-non-prd/uw2-stg/*argo cd/
stg/bazle-demo-frontend.yaml
bazle-demo-api.yaml
bazle-demo-db.yaml
(output from helmfile template)
uat/
uw2-dev/*
uw2-sbx/*
acme/argo cd-deploy-preview/uw2-dev/*
pr2-example-app/example.yaml

Setup Branch Protections

Production

We need to tightly control who can deploy to these environments. Using branch protections, no one (including bot users) can commit directly to the main branch.

Pull Requests will be the only way to affect the main branch and will require approvals from teams defined by the CODEOWNERS and based on the cluster and namespace and status checks.

Required status checks to merge the pull requests:

  1. Workflows in the non-production environment will post status indicating success or failure. In the example below, the commit has a successful staging deployment (success) check, enabling deployment to production.

  2. Workflows in the acme/argo cd-deploy-prd/.github/workflows/ will do linting on the kubernetes manifests (where manifests exists)


Pull Requests will be predominantly machine generated by GitHub Actions. However, anyone can open a Pull Request (e.g for hotfixes), and with appropriate approvals get them deployed.

Non-production

The non-prod environments will work identically to production environments, with one exception. We’ll use the Mergify bot to automatically approve pull requests matching specific criteria, while supporting the same core functionality of production without the human overhead of acquiring approvals.

Preview Environments

These environments will not require Pull Requests, however, will restrict the main branch to commits from the acmebot. This will enable rapid continuous delivery for review apps.

Deploy Argo CD with terraform-helm-argo cd module on each target cluster

We’ll deploy Argo CD on each target cluster so it can easily manage in-cluster resources and we don’t need to worry about cross-cluster networking, permissions, roles, etc.

Configuration

Settings (non-sensitive)

These come from the example1/env/ folder and are defined by namespace.

Secrets

Release Engineering Process

This is our recommended process for continuous delivery of monorepos.

Using Argo CD with helmfile provides limited utility for the following reasons:

  • We cannot use helmfile’s native ability to fetch secrets from ASM or SSM, because that will result in secrets getting committed to source control

  • We cannot use helmfile hooks, because those run at execution time of helmfile, and not during synchronization with Argo CD

Instead, we can achieve similar results using native support for Helm in Argo CD and the External Secrets Operator.

Inside each monorepo, define a charts folder that has all charts for this repository.

  • We recommend one chart for each type of app (e.g. c++ microservice), rather than one chart per app.

  • GitHub Action should validate the chart (lint check) for all Pull Requests

Preview Environments

Developer adds deploy label to Pull Request

GitHub Action Workflow runs on events “opened”, “reopened”, “synchronized”

  • Builds & pushes docker images

  • Checks out the main branch in the preview repo

  • Commits the follow custom resource and pushes to main branch

File is generated by the GitHub Action based on example1/app.yaml

File is written to acme/argo cd-deploy-preview/uw2-stg/argo cd/pr2-example1.yaml

####### Example 1: acme/argo cd-deploy-preview/uw2-stg/argo cd/pr2-example1.yaml (artifact)

# Example Argo CD Application (use built-in Helm) generated from the app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: pr2-example1
# This should always be argo cd (it’s not where the app will run)
namespace: argo cd
finalizers:
- resources-finalizer.argo cd.argoproj.io
spec:
source:
# Example of shared chart in the bazel-monorepo-demo
path: charts/monochart
repoURL: acme/bazel-monorepo-demo/
targetRevision: {git-sha}
helm:
version: v3
valueFiles:
- example1/env/default.yaml
- example1/env/preview.yaml
parameters:
- name: image
value: ecr/example1:{git-sha}

Example of the acme/bazel-monorepo-demo/example1/env/default.yaml

acme/bazel-monorepo-demo/example1/env/default.yaml
# This is a standard helm values file
# The defaults set common values for all releases
use_db: true
acme/bazel-monorepo-demo/example1/env/preview.yaml
# This is a helm values file
# This is an example of overriding the defaults. It needs parameterization.
host: pr12-example1.acme.org
acme/bazel-monorepo-demo/example1/app.yaml
# this is the configuration used by the GitHub action to build the Argo CD Application Manifest
chart: charts/monochart

GitHub Action Step:

  1. Looks at app.yaml (our own custom specification) to generate the Argo CD Application

  2. Produces acme/argo cd-deploy-preview/uw2-stg/argo cd/pr2-example1.yaml

####### Example 2

Fundamentally similar to Example 1. The difference is that it bypasses the need for Helm.

# Example with Raw Kubernetes Manifests (alternative to helm strategy)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: pr2-example1
# This should always be argo cd (it’s not where the app will run)
namespace: argo cd
finalizers:
- resources-finalizer.argo cd.argoproj.io
spec:
destination:
namespace: pr2-example1
project: default
source:
path: uw2-stage/pr2-example1/
repoURL: acme/argo cd-deploy-preview/
targetRevision: main
---
# Deployment
# Service
# Ingress

Staging Releases

The process always starts by cutting a GitHub release against the monorepo.

This triggers the GitHub Action on release events.

  • Retag the artifacts (presumed to already exist)

  • Open Pull Request against acme/argo cd-deploy-non-prd/

  • Open Pull Request against acme/argo cd-deploy-prd/

When respective pull requests are merged to main, subject to all branch protections, Argo CD kicks off the automatic deployment.

Production Releases

Production releases are triggered by merging the respective PR, subject to all branch protections.

The most notable branch protection is the requirement that the status check for the commit (corresponding to the release) has a passing “staging deployment” check.


Demo

Update acme/bazel-monorepo-demo with 3 or more examples (we’ll just clone the current one) to set up example1/, example2/, example3/ services. The objective is to show how we can use shared charts as well as dedicated charts per service.

Convert helmfile raw chart examples to sample charts in the charts/ folder of the monorepo.

Deprecate the helmfile.yaml example

Update GitHub Actions workflow

  1. Produce the Argo CD Application manifest artifact from the app.yaml

  2. Open Pull Request against repo (for prd and non-prd)

  3. Push directly for preview environments

Create acme/argo cd-deploy-non-prd/.github/mergify.yml for auto-merging PRs

acme/argo cd-deploy-non-prd/.github/mergify.yml
pull_request_rules:
- name: "approve automated PRs that have passed checks"
conditions:
- "check-success~=lint/kubernetes"
- "check-success~=lint/helm"
- "base=main"
- "author=acmebot"
- "head~=argo cd/.*"
actions:
review:
type: "APPROVE"
bot_account: "acmebot"
message: "We've automatically approved this PR because the checks from the automated Pull Request have passed."

- name: "merge automated PRs when approved and tests pass"
conditions:
- "check-success~=lint/kubernetes"
- "check-success~=lint/helm"
- "base=main"
- "head~=argo cd/.*"
- "#approved-reviews-by>=1"
- "#changes-requested-reviews-by=0"
- "#commented-reviews-by=0"
- "base=master"
- "author=acmebot"
actions:
merge:
method: "squash"

- name: "delete the head branch after merge"
conditions:
- "merged"
actions:
delete_head_branch: {}

- name: "ask to resolve conflict"
conditions:
- "conflict"
actions:
comment:
message: "This pull request is now in conflict. Could you fix it @{{author}}? 🙏"

- name: "remove outdated reviews"
conditions:
- "base=main"
actions:
dismiss_reviews:
changes_requested: true
approved: true
message: "This Pull Request has been updated, so we're dismissing all reviews."

Create the sample acme/argo cd-deploy-non-prd/.github/CODEOWNER

acme/argo cd-deploy-non-prd/.github/mergify.yml
uw2-stg/* @acme/staging @acme/mergebots

Enable branch protections

Research

apiVersion: v1
kind: ConfigMap
metadata:
name: argo cd-notifications-cm
data:
config.yaml: |
triggers:
# Define your custom trigger
- name: my-custom-trigger
condition: app.status.sync.status == 'Unsynced' && ( time.Parse(app.status.operationState.finishedAt) - time.Parse(app.status.operationState.startedAt) ) < 100
template: my-custom-template
templates:
# Add your custom template
- name: my-custom-template
title: Hello {{.app.metadata.name}}
body: |
Application details: {{.context.argo cdUrl}}/applications/{{.app.metadata.name}}.
  • [https://argo cd-notifications.readthedocs.io/en/stable/services/opsgenie/](https://argo cd-notifications.readthedocs.io/en/stable/services/opsgenie/)
apiVersion: argoproj.io/v1alpha1
kind: Notification
metadata:
name: notification-1
spec:
# Define resource need to be monitor.
monitorResource:
Group: "argoproj.io"
Resource: "application"
Version: "v1alpha1"
Namespace: default
notifiers:
- name: slack
slack:
hookUrlSecret:
name: my-slack-secret
key: hookURL
channel: testargonotification
hookurl: "https://hooks.slack.com"
rules:
- allConditions:
- jsonPath: "status/sync/status"
operator: "ne"
value: "Unsynced"
Events:
- message: "Condition Triggered : Deployment = {{.metadata.name}} replicaset does not match. Required Replicas = {{.status.replicas}}, Current Replicas={{.status.readyReplicas}}"
emailSubject: "Argo Notification Condition Triggered {{.metadata.name}}"
notificationLevel: "warning"
notifierNames:
- "slack"
name: rule1
initialDelaySec: 60
throttleMinutes: 5

  • Can deliverybot simplify some of this process (including updating the GitHub Status API)

  • At the moment it looks like delivery bot will add complexity. As the only thing it is doing - triggering a deployment event for github actions that do all work.

https://deliverybot.dev/docs/guide/2-deploy-action/

https://github.com/deliverybot/example-helm/blob/master/.github/workflows/cd.yml

Suggestions

Common suggestions and practices to consider and discuss in the implementation of Argo CD solutions.

ALB and Ingress

Overview

  • Argo CD is capable of deploying and updating resources of all types on a Kubernetes cluster (including RBAC and Ingress, etc.). Some extra consideration should be taken in the Architecture used per customer engagement. Where possible we should try to isolate the Argo CD API and UI from other application and service traffic.

  • The term common ALB/Ingress in this section refers to identifying an ingress by group.name where any ingress that references that same group.name applies its configurations to to the same alb resources. There is also a concept of weighting group based rules so it is possible for another ingress annotations to supersede already established configurations by setting a smaller number (higher priority) with group.order.

  • Previous load balancer patterns used with Cloud Posse engagements utilize a single, common, internet-facing ALB, and Ingress Groups ensure that all services will use this ALB and Ingress. Services such as Argo CD should probably not be internet-facing, even if they are backed by authentication. Maybe we need to extend this pattern to multiple group.name ALBs, for internet facing, public facing and any service that should be treated as a separate security or functional concern.

  • Ingress groups should not overlap different k8s RBAC permission boundaries (e.g. namespaces) because different users can override existing rules in the ingress group with higher priority rules. So maybe common ALBs should be per k8s namespace.

References