Amazon DevOps Guru のアイコン

Amazon DevOps Guru Specialized2020年〜

A service that uses machine learning to automatically detect and diagnose operational issues in applications

What It Does

Amazon DevOps Guru uses machine learning to analyze operational metrics of your AWS resources and automatically detect signs of performance degradation or failures in your applications. It integrates CloudWatch metrics, CloudTrail logs, and AWS Config change history to identify root causes and recommend remediation actions.

Use Cases

It is used for early detection of latency increases and error rate spikes in production applications, automatic detection of abnormal behavior after deployments, and diagnosing AWS service-specific issues like Lambda function timeouts and DynamoDB throttling.

Everyday Analogy

Think of it like an experienced system administrator. They constantly monitor various server gauges and notice things like 'This CPU usage pattern is unusual. It might be caused by yesterday's deployment,' catching early signs of trouble and telling you the cause and how to fix it.

What Is Amazon DevOps Guru?

Amazon DevOps Guru is a service that uses machine learning to automatically detect operational issues in applications running on AWS. Traditionally, operations teams had to monitor CloudWatch dashboards, set up alarms, and manually investigate logs when problems occurred. DevOps Guru automates these tasks - when it detects abnormal patterns, it presents root cause analysis and recommended actions. This reduces the burden on operations teams and helps resolve issues faster.

Insights and Recommended Actions

When DevOps Guru detects an anomaly, it reports it as an "insight." There are two types: reactive insights (problems that are already occurring) and proactive insights (signs that could become problems in the future). Each insight includes graphs of anomalous metrics, a list of affected resources, and recommended remediation steps. For example, you might see a specific suggestion like 'DynamoDB table read capacity is insufficient. Consider switching to on-demand mode.' For related technical books on insights and recommended actions, reference books (Amazon) are also available.

Coverage and Notifications

DevOps Guru can analyze your entire AWS account, specific CloudFormation stacks, or resources with specific tags. It integrates CloudWatch metrics, CloudTrail API call logs, and Config configuration change history for comprehensive analysis. Notifications for detected anomalies can be configured via SNS topics or EventBridge, and can also integrate with external tools like Slack and PagerDuty.

Things to Watch Out For

  • Pricing is based on the number of analyzed AWS resources and API calls. Be mindful of costs in environments with many resources
  • The machine learning model takes 1-2 weeks to train, so detection accuracy may be lower immediately after activation
  • DevOps Guru detects and diagnoses problems but does not auto-remediate. The operations team must carry out the remediation actions
共有するXB!