AWS Fault Injection Simulator Specialized2021年〜
A service that intentionally injects faults into AWS environments to test system resilience
What It Does
AWS Fault Injection Simulator (FIS) is a managed service for practicing chaos engineering. It lets you safely execute various failure scenarios such as stopping EC2 instances, injecting network latency, and simulating AZ outages. Experiments include stop conditions that automatically halt the experiment if impact exceeds expectations.
Use Cases
It is used for disaster recovery drills in production-like configurations, validating multi-AZ redundancy, verifying Auto Scaling behavior, and testing application timeout handling and retry logic. It is also used to automate Game Days (incident response drills).
Everyday Analogy
Think of it like a fire drill in a building. You actually trigger the fire alarm, stop the elevators, and verify that evacuation routes to emergency stairs work correctly. Before a real fire occurs, you can safely test whether escape routes and fire safety equipment function properly.
What Is AWS Fault Injection Simulator?
AWS Fault Injection Simulator (FIS) is a service for practicing chaos engineering in AWS environments. Chaos engineering is the practice of intentionally injecting faults into production environments to observe how systems behave. Based on the premise that 'failures will happen,' it discovers weaknesses in advance so they can be fixed, minimizing the impact of actual failures. FIS provides a platform for running these experiments safely and reproducibly.
Experiment Templates and Actions
In FIS, you create 'experiment templates' to define failure scenarios. Templates specify the actions to execute (types of faults), target resources, and stop conditions. Available actions include stopping/restarting EC2 instances, stopping ECS tasks, injecting network latency and packet loss, and simulating API throttling. You can build complex scenarios with multiple actions running in parallel or sequentially. To expand your knowledge of experiment templates and actions, technical books (Amazon) can also be helpful.
Running Experiments Safely
A key feature of FIS is its built-in safety mechanisms. You can specify CloudWatch alarms as stop conditions - for example, automatically stopping the experiment if the error rate exceeds a threshold. IAM roles limit the scope of resources an experiment can affect, preventing unintended impact. Detailed execution logs are recorded for post-experiment analysis and improvement.
Things to Watch Out For
- Pricing is based on action execution time (per minute). No charges during experiment design
- Without proper stop conditions, experiments may impact production more than expected. Always integrate with CloudWatch alarms
- FIS experiments cause real impact on target resources. Test thoroughly in development or staging environments before running in production