Google Cloud Professional Cloud DevOps Engineer — Question 145
You encounter a large number of outages in the production systems you support. You receive alerts for all the outages, the alerts are due to unhealthy systems that are automatically restarted within a minute. You want to set up a process that would prevent staff burnout while following Site Reliability Engineering (SRE) practices. What should you do?
Answer options
- A. Eliminate alerts that are not actionable
- B. Redefine the related SLO so that the error budget is not exhausted
- C. Distribute the alerts to engineers in different time zones
- D. Create an incident report for each of the alerts
Correct answer: A
Explanation
The correct answer is A because removing non-actionable alerts helps reduce noise and allows engineers to focus on critical issues, thus preventing burnout. Option B may manage the error budget but does not address the immediate problem of alert fatigue. Option C may help distribute workload but does not reduce the volume of alerts. Option D increases administrative overhead without improving the situation.