Google Cloud Professional Cloud DevOps Engineer — Question 12
You encountered a major service outage that affected all users of the service for multiple hours. After several hours of incident management, the service returned to normal, and user access was restored. You need to provide an incident summary to relevant stakeholders following the Site Reliability Engineering recommended practices. What should you do first?
Answer options
- A. Call individual stakeholders to explain what happened.
- B. Develop a post-mortem to be distributed to stakeholders.
- C. Send the Incident State Document to all the stakeholders.
- D. Require the engineer responsible to write an apology email to all stakeholders.
Correct answer: B
Explanation
The correct answer is B, as developing a post-mortem is essential for analyzing the incident thoroughly and communicating lessons learned. Option A is not effective because it doesn't provide a comprehensive overview to all stakeholders. Option C, while informative, is not the first step; the post-mortem is more critical for understanding the incident. Option D focuses on apologies rather than addressing the root cause and preventive measures, which is not the priority in incident management.