Google Cloud Professional Machine Learning Engineer — Question 210
You need to develop a custom TensorFlow model that will be used for online predictions. The training data is stored in BigQuery You need to apply instance-level data transformations to the data for model training and serving. You want to use the same preprocessing routine during model training and serving. How should you configure the preprocessing routine?
Answer options
- A. Create a BigQuery script to preprocess the data, and write the result to another BigQuery table.
- B. Create a pipeline in Vertex AI Pipelines to read the data from BigQuery and preprocess it using a custom preprocessing component.
- C. Create a preprocessing function that reads and transforms the data from BigQuery. Create a Vertex AI custom prediction routine that calls the preprocessing function at serving time.
- D. Create an Apache Beam pipeline to read the data from BigQuery and preprocess it by using TensorFlow Transform and Dataflow.
Correct answer: D
Explanation
The correct answer is D because using an Apache Beam pipeline with TensorFlow Transform allows for seamless instance-level preprocessing of data both for training and serving, ensuring consistency. Option A only preprocesses the data once and stores it, which doesn't maintain real-time transformation. Option B is focused on a custom component but does not leverage the power of TensorFlow Transform. Option C suggests a function that may not be as efficient or integrated as using TensorFlow Transform in a Beam pipeline.