Google Cloud Professional Data Engineer — Question 73
Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud
Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in the dashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?
Answer options
- A. Check the dashboard application to see if it is not displaying correctly.
- B. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.
- C. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.
- D. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.
Correct answer: B
Explanation
Running a fixed dataset through the Cloud Dataflow pipeline allows you to isolate the processing logic and verify if there are any issues in the data transformation or any potential data loss. The other options do not directly address the integrity of the processing pipeline, as checking the dashboard does not confirm data processing issues, monitoring may not pinpoint specific data loss, and switching to pull may not resolve the underlying issue.