Skip to main content

Use OpsGenie for Incident Management

Monitoring platforms like CloudWatch and Datadog historically provide very poor support for Incident Management. Incident Management is the art of ingesting, classifying, and escalating alerts to stakeholders based on rotations, teams, services, etc.

Latest!

The content in this ADR is up-to-date! For questions, please reach out to Cloud Posse

Context​

There are quite a few incident management platforms available, with PagerDuty being the OG. Customers often ask why we selected OpsGenie over PagerDuty, this is our current rationale.

TL;DR: We support OpsGenie today and have a considerable investment in supporting it, but are open to implementing PagerDuty.

OpsGenie (Decided)​

https://github.com/cloudposse/terraform-opsgenie-incident-management

Pros​

  • Most customers use Atlassian products, including Jira, Service Desk, and Confluence which are all tightly integrated with OpsGenie

  • High feature parity with PagerDuty

  • OpsGenie is by Atlassian and tightly integrated

  • OpsGenie is less expensive than PagerDuty

  • OpsGenie is tightly integrated with StatusPage

  • Cloud Posse only has prior art for OpsGenie πŸ˜ƒ (e.g. 20+ sprints executed on opsgenie, but none on pagerduty)

Cons​

  • Lacks some of the AI features now present in more modern Incident Management Platforms

PagerDuty​

Customers frequently ask if we have PagerDuty support. The short answer is not yet. The longer answer is, we’re open to supporting it, if someone sponsors the development. We support OpsGenie due to customer demand.

Pros​

Cons​

  • More expensive than PagerDuty

Datadog Incident Management​

Datadog released its own incident management platform at the tail end of 2020. We’ve not had a chance to evaluate the platform, mostly because as of this writing, terraform support is non-existent. For this reason, we ruled it out.

Alert Panda​

Not Considered

VictorOps​

Not Considered

Decision​

  • Use OpsGenie

Consequences​