Google Cloud Professional Cloud DevOps Engineer — Question 22
You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicating that the service is failing to serve most of its requests and all of its dependent systems with hundreds of thousands of users are affected. As part of your Site Reliability Engineering (SRE) incident management protocol, you declare yourself Incident Commander (IC) and pull in two experienced people from your team as Operations Lead (OL) and
Communications Lead (CL). What should you do next?
Answer options
- A. Look for ways to mitigate user impact and deploy the mitigations to production.
- B. Contact the affected service owners and update them on the status of the incident.
- C. Establish a communication channel where incident responders and leads can communicate with each other.
- D. Start a postmortem, add incident information, circulate the draft internally, and ask internal stakeholders for input.
Correct answer: C
Explanation
The correct answer is C because establishing a communication channel is crucial for efficient collaboration among the incident response team, ensuring everyone is aligned and informed. Options A and B, while important, should occur after setting up effective communication. Option D is premature at this stage, as the focus should be on resolving the incident rather than analyzing it.