Google Cloud Professional Machine Learning Engineer — Question 181
You work for a food product company. Your company’s historical sales data is stored in BigQuery.You need to use Vertex AI’s custom training service to train multiple TensorFlow models that read the data from BigQuery and predict future sales. You plan to implement a data preprocessing algorithm that performs mm-max scaling and bucketing on a large number of features before you start experimenting with the models. You want to minimize preprocessing time, cost, and development effort. How should you configure this workflow?
Answer options
- A. Write the transformations into Spark that uses the spark-bigquery-connector, and use Dataproc to preprocess the data.
- B. Write SQL queries to transform the data in-place in BigQuery.
- C. Add the transformations as a preprocessing layer in the TensorFlow models.
- D. Create a Dataflow pipeline that uses the BigQuerylO connector to ingest the data, process it, and write it back to BigQuery.
Correct answer: B
Explanation
The correct answer is B because writing SQL queries to transform data directly in BigQuery allows for efficient processing without the overhead of additional tools or services, minimizing both time and cost. Options A and D involve using additional services that add complexity and potential delays, while option C integrates preprocessing into the TensorFlow models, which may not be the most efficient way to handle large-scale data transformations.