Google Cloud Professional Data Engineer — Question 130
You want to create a machine learning model using BigQuery ML and create an endpoint for hosting the model using Vertex AI. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?
Answer options
- A. Create a new BigQuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the "ingestion" dataset as the framing data.
- B. Use BigQuery streaming inserts to land the data from multiple vendors where your BigQuery dataset ML model is deployed.
- C. Create a Pub/Sub topic and send all vendor data to it. Connect a Cloud Function to the topic to process the data and store it in BigQuery.
- D. Create a Pub/Sub topic and send all vendor data to it. Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
Correct answer: D
Explanation
The correct answer is D, as using Dataflow allows for data processing and sanitization before it is stored in BigQuery, accommodating invalid values. Option A suggests using an ingestion dataset without processing, which may not handle invalid values properly. Option B merely suggests streaming inserts without addressing data quality. Option C processes data but lacks the sanitization capability that Dataflow provides, making it less suitable for handling invalid data.