Assessing Application Fault Tolerance with AWS Resilience Hub - Visualizing RTO/RPO Target Achievement

Learn how to assess application fault tolerance with Resilience Hub, configure RTO/RPO policies, and leverage improvement recommendations.

Resilience Hub Overview

Resilience Hub is a service that assesses application fault tolerance and visualizes whether RTO targets (e.g., recovery within 30 minutes) and RPO targets (e.g., data loss within 1 hour) are being met. It automatically discovers resource configurations from CloudFormation stacks and evaluates fault tolerance across four scenarios: AZ failure, region failure, application failure, and infrastructure failure.

Assessment and Improvement Recommendations

You define RTO/RPO targets for each scenario in a fault tolerance policy. When you run an assessment, it determines whether the current architecture meets those targets and provides specific improvement recommendations if it falls short. Prioritized improvement actions are generated, such as adding Multi-AZ configurations, enabling backups, and setting up cross-region replication. Integration with FIS lets you test actual failure scenarios and verify the effectiveness of improvements.

Testing and Operational Integration

Resilience Hub generates fault tolerance test recommendations and integrates with Fault Injection Service (FIS) to test actual failure scenarios. It verifies whether your application can meet RTO/RPO targets under scenarios such as AZ failure, EC2 instance termination, and RDS failover. A results scorecard visualizes the fault tolerance level of each component, clarifying improvement priorities. Automated SOP (Standard Operating Procedure) generation standardizes incident response procedures. Schedule periodic re-assessments to track changes in fault tolerance after infrastructure modifications. For a systematic study of Resilience Hub, related books on Amazon are also a helpful reference.

Resilience Hub Pricing

Resilience Hub pricing is based on the number of application assessments. Each assessment costs approximately $1.50, and scheduling monthly assessments results in approximately $1.50/month per application. FIS test execution incurs separate FIS charges. Select applications for assessment based on criticality, and prioritize business-critical applications rather than assessing all applications uniformly to manage costs.

Summary

Resilience Hub is a service that quantitatively assesses application fault tolerance and visualizes RTO/RPO target achievement. It automatically discovers architecture from CloudFormation stacks and Terraform state files, and runs assessments for AZ failure and region failure scenarios. Integration with FIS automates fault testing, and periodic re-assessments enable continuous improvement of fault tolerance.