Google Cloud Professional Cloud Architect — Question 202
Your company runs a critical, revenue-generating ecommerce application that is served by a regional managed instance group (MIG) behind an external HTTP(S) Load Balancer. The operations team is currently overwhelmed with low-priority notifications and is starting to ignore alerts. Your team's service level objective (SLO) is to maintain 99.9% availability, which is measured by the ratio of successful requests (2xx status codes) to total requests. You want to minimize noise from non-critical events and ensure that the team is only notified of issues that are actionable and threaten the SLO. What should you do?
Answer options
- A. Focus on cause-based alerts, creating alerting policies with thresholds for the Compute Engine instances, including CPU utilization, memory usage, disk I/O, and network traffic.
- B. Create log-based alerts for only the WARN and ERROR log entries generated by the application to ensure that no potential issue is missed.
- C. Implement an error budget policy based on the availability of the SLO. Create a "page” alert that triggers only when the rate of burn of the error budget predicts a full exhaustion within the next 24 hours.
- D. Configure alerts based on predictive metrics. Use the instance count of the MIG as the primary metric to trigger an alert.
Correct answer: C
Explanation
The correct answer is C because implementing an error budget policy allows you to focus on significant issues that could impact your SLO, ensuring alerts are only triggered when necessary. Option A focuses on cause-based alerts which may still generate non-critical notifications. Option B risks missing potential issues by limiting alerts to only WARN and ERROR logs. Option D relies on predictive metrics that may not accurately reflect the current status of the application, leading to possible oversights.