Google Cloud Professional Cloud DevOps Engineer — Question 12

You encountered a major service outage that affected all users of the service for multiple hours. After several hours of incident management, the service returned to normal, and user access was restored. You need to provide an incident summary to relevant stakeholders following the Site Reliability Engineering recommended practices. What should you do first?

Answer options

Correct answer: B

Explanation

The correct answer is B, as developing a post-mortem is essential for analyzing the incident thoroughly and communicating lessons learned. Option A is not effective because it doesn't provide a comprehensive overview to all stakeholders. Option C, while informative, is not the first step; the post-mortem is more critical for understanding the incident. Option D focuses on apologies rather than addressing the root cause and preventive measures, which is not the priority in incident management.