Google Cloud Professional Machine Learning Engineer — Question 39
You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?
Answer options
- A. Normalize the data using Google Kubernetes Engine.
- B. Translate the normalization algorithm into SQL for use with BigQuery.
- C. Use the normalizer_fn argument in TensorFlow's Feature Column API.
- D. Normalize the data with Apache Spark using the Dataproc connector for BigQuery.
Correct answer: B
Explanation
The correct answer is B because translating the normalization algorithm into SQL allows for efficient processing directly within BigQuery, minimizing the need for manual intervention. Options A and D involve additional overhead by using external services, while C is not relevant as the normalization process occurs before the model training phase.