Skip to main content

Implement Telemetry

Monitoring is a key component of any production system. It is important to have visibility into the health of your system and to be able to react to issues before they become problems.

AI generated voice

The Problem

Monitoring is a difficult problem to solve. There are many different tools and services that can be used to monitor your system. It is important to have a consistent approach to monitoring that can be applied across all of your systems.

There is often a tradeoff between the cost of monitoring and the value it provides. It is important to have a monitoring solution that is cost effective and provides value to your organization. Another problem is when monitoring is configured incorrectly and causes more problems than it solves, usually seen through ignored alerts or no alerts at all.

Our Solution

We have developed a set of Terraform modules that can be used to deploy a monitoring solution for your system. These modules are designed to be used with Datadog. Datadog is a monitoring service that provides a wide range of features and integrations with other services.

We have broken down the monitoring solution into several components to make it easier to deploy and manage.

Implementation

Foundation

  • datadog-configuration: This is a utility component. This component expects Datadog API and APP keys to be stored in SSM or ASM, it then copies the keys to SSM/ASM of each account this component is deployed to. This is for several reasons:
    1. Keys can be easily rotated from one place
    2. Keys can be set for a group and then copied to all accounts in that group, meaning you could have a pair of api keys and app keys for production accounts and another set for non-production accounts. This component is required for all other components to work. As it also stores information about your Datadog account, which other components will use, such as your Datadog site url, along with providing an easy interface for other components to configure the Datadog provider.
  • datadog-integration: This component is the core component binding Datadog to AWS, this component is deployed to every account and sets up all the Datadog Integration tiles with AWS. This is what provides the majority of your metrics to AWS!
  • datadog-lambda-forwarder: This component is an AWS Lambda function that ships logs from AWS to Datadog. Details of it can be found here
  • datadog-monitor: This component deploys monitors via yaml configuration. When you vendor in this component you will find our catalog of pre-built monitors. We deploy this component to every account, our monitors have Terraform interpolation to allow you to set the thresholds for each monitor. This allows you to set different thresholds per stage using the same monitors but different configurations using familiar atmos inheritance.

EKS

  • datadog-agent: This component deploys the Datadog agent on EKS, it also deploys the Datadog Cluster Agent, the agent is a daemonset that runs on every node in your cluster (with the exception of fargate (serverless) nodes). This component handles sending Kubernetes metrics, logs, and events to Datadog. This component also can deploy the Datadog Cluster Checks which are a way to run checks on your cluster from within the cluster itself, this is often a cheaper way than Synthetic Monitoring to monitor services in your cluster.
  • datadog-private-location-eks: This component deploys a private location for Synthetic Monitoring to your EKS cluster. This allows synthetic checks to run even inside a private cluster.

ECS

Additional

References