Decide on Self-Hosted GitHub Runner Strategy

Problem

GitHub Actions enables organizations to run their own runners (aka workers) free-of-charge. Only problem is we have to host them and there’s a couple of ways of doing it, each with their own pros and cons. For the most part, we don’t consider it optional to deploy runners, if using GitHub Actions.

Considerations

We have prior art for the following strategies. The strategies are not mutually exclusive, but most often we see companies only implement one solution.

Considerations

GitHub Actions is the CI/CD platform offered by GitHub. Its main advantages are that, firstly, it comes at no additional costs on top of the cost of seats for users within the GitHub organization, when using self-hosted Runners. Secondly, there is a thriving ecosystem of open source Actions that can be used within GitHub Action workflows.

Self-hosted Runners on EC2

GitHub Actions Runners can be self-hosted on EC2 or on Kubernetes clusters. This is probably the easiest to deploy and understand how it works, but it’s the least optimal way of managing runners.

On EC2, the Runner executable can be installed using a user-data script, or baked into the AMI of the instance to be deployed to EC2. The runner can be deployed to an Auto Scaling Group (ASG), which usually presents a de-registration problem when the ASG scales in, however EventBridge and AWS Systems Manager can be utilized in tandem to have the runner de-register prior to being terminated (see: Cloud Posse's github-runners component which has this implementation).

It’s also possible to use Spot Instances. Easily use AWS SSM to connect to runners. Autoscale anytime sustained CPU capacity > 5% for ~5 minutes (e.g. doing anything). Scale down when CPU capacity is < 2% for 45 minutes (e.g. doing nothing). Requires a minimum of 1 node online. This is a good route if builds need to run on multiple kinds of architectures (e.g. ARM64 for M1), for which operating the kubernetes node pools would be cumbersome.

If we go this route, we’ll also want to determine if we should deploy the Datadog Monitoring Agent on the nodes.

https://github.com/cloudposse/terraform-aws-ec2-autoscale-group

Self-hosted Runners on Kubernetes

tip

This is our recommended approach

Deploying these Runners on Kubernetes is possible using actions-runner-controller. With this controller, a small-to-medium-sized cluster can house a large number of Runners (depending on their requested Memory and CPU resources), and these Runners can scale automatically using the controller’s HorizontalRunnerAutoscaler CRD. This has the benefit that it can scale to zero and leverages all the monitoring we have on the platform. This solution also allows for using a custom runner image without having to rebuild an AMI or modify a user-data script and re-launch instances, which would be necessary when deploying the Runners to EC2.

actions-runner-controller also supports several various mechanisms for scaling the number of Runners: PercentageRunnersBusy simply scales the Runners up or down based on how many of them are currently busy, without having to maintain a list of repositories used by the Runners, which would be the case in TotalNumberOfQueuedAndInProgressWorkflowRuns. The most efficient and recommended option for horizontal auto-scaling using the actions-runner-controller, however, is to enable the controller’s webhook server and configure the HorizontalRunnerAutoscaler to scale on GitHub webhook events (for event name: check_run, type: created, status: queued). Note that the actions-runner-controller does not have any logic to automatically create the webhook configuration in GitHub, and hence, the webhook server needs to be exposed and configured manually in GitHub or using the GitHub API. If using aws-load-balancer-controller, ensure that within actions-runner-controller Helm chart, githubWebhookServer.ingress.enabled is set to true, and if using external-dns, set githubWebhookServer.ingress.annotations to include an external-dns.alpha.kubernetes.io/alias. Finally, configure the webhook in GitHub to match the hostname and port of the endpoint corresponding to the newly-created Ingress object.

In general, Cloud Posse recommends using actions-runner-controller over EC2-based Runners due to the flexibility in runner sizing, choice of container image, and advanced horizontal scaling options. If however, EC2 Runners need to be utilized due to specific requirements such as a build environment on ARM-based instances, then that option is recommended as well.

https://github.com/summerwind/actions-runner-controller

Repository-wide or Organization-wide Runners

Self-hosted GitHub Actions Runners can be made to be either repository-wide or organization-wide. Runners registered for a specific repository can only run for workflows corresponding to that repository, while Runners registered for an organization can run for any workflow for any repository in an organization, provided that the labels selected by the runs-on attribute in the workflow definition match the labels corresponding to the runner. Repo-level runners have the befit of reduced scope for PAT, however, pools are not shared across repos so there are wasted resources.

In general, Cloud Posse recommends choosing Organization-wide Runners and ensuring horizontal scaling is configured to adequately respond to an influx of queued runs (see the previous section).

https://docs.github.com/en/actions/hosting-your-own-runners/managing-access-to-self-hosted-runners-using-groups

Labeling Runners

Whenever a GitHub Actions Runner is registered, it provides a list of labels to GitHub. Then, workflow definitions can specify which Runners to run on.

For example, if the workflow syntax specifies runs-on: [self-hosted, linux], then the runner must be registered with the label linux.

In another example, if two Runners are registered, one with the labels linux, ubuntu, and medium, and one with the labels linux, ubuntu, and large, and workflow A specifies runs-on: [self-hosted, ubuntu] and workflow B specifies runs-on: [self-hosted, ubuntu, large], then:

Workflow A can run on both the first runner and the second runner. It’ll run on whichever is available.
Workflow B can only run on the second runner.

Some advanced configurations may involve creating multiple RunnerDeployment CRDs (actions-runner-controller) which use different container images with different linux distributions or with different packages installed, then naming the labels accordingly. In general, Cloud Posse’s recommendation is to create meaningful runner labels that can be later referenced by developers writing GHA Workflow YAML files.

Integration with AWS

The GitHub Actions Runners often need to perform continuous integration tasks such as write to S3 or push container images to ECR. With GitHub-hosted Runners this has historically been difficult but is now made possible using GitHub’s support for OIDC, which allows for creating a trust relationship between GitHub-hosted Runners and an IAM role in one of the organization’s AWS account (preferably a dedicated automation account). For actions-runner-controller, this has been historically possible for a longer time now on self-hosted GitHub Actions Runners running on actions-runner-controller using EKS cluster’s OIDC provider (see: IRSA). If using GHA Runners on EC2, an EC2 instance profile can be created, allowing the instances to assume an IAM role.

Problem​

Considerations​

Considerations​

Self-hosted Runners on EC2​

Self-hosted Runners on Kubernetes​

Repository-wide or Organization-wide Runners​

Labeling Runners​

Integration with AWS​

Related Decisions​

Problem

Considerations

Considerations

Self-hosted Runners on EC2

Self-hosted Runners on Kubernetes

Repository-wide or Organization-wide Runners

Labeling Runners

Integration with AWS

Related Decisions