Google Cloud Associate Cloud Engineer — Question 276
Your company was recently impacted by a service disruption that caused multiple Dataflow jobs to get stuck, resulting in significant downtime in downstream applications and revenue loss. You were able to resolve the issue by identifying and fixing an error you found in the code. You need to design a solution with minimal management effort to identify when jobs are stuck in the future to ensure that this issue does not occur again. What should you do?
Answer options
- A. Update the Dataflow job configurations to send messages to a Pub/Sub topic when there are delays. Configure a backup Dataflow job to process jobs that are delayed. Use Cloud Tasks to trigger an alert when messages are pushed to the Pub/Sub topic.
- B. Set up Cloud Monitoring alerts on the data freshness metric for the Dataflow jobs to receive a notification when a certain threshold is reached.
- C. Set up Error Reporting to identify stack traces that indicate slowdowns in Dataflow jobs. Set up alerts based on these log entries.
- D. Use the Personalized Service Health dashboard to identify issues with Dataflow jobs across regions.
Correct answer: B
Explanation
The correct answer is B because setting up Cloud Monitoring alerts on the data freshness metric allows for proactive notifications when jobs are not performing as expected, enabling timely intervention. Options A and C involve more complex configurations and management, which is not in line with the requirement for minimal management effort. Option D, while useful for general service health, does not specifically address the need to monitor job performance and delays.