Google Cloud Professional Machine Learning Engineer — Question 209
You have trained a model by using data that was preprocessed in a batch Dataflow pipeline. Your use case requires real-time inference. You want to ensure that the data preprocessing logic is applied consistently between training and serving. What should you do?
Answer options
- A. Perform data validation to ensure that the input data to the pipeline is the same format as the input data to the endpoint.
- B. Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline. Use the same code in the endpoint.
- C. Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline. Share this code with the end users of the endpoint.
- D. Batch the real-time requests by using a time window and then use the Dataflow pipeline to preprocess the batched requests. Send the preprocessed requests to the endpoint.
Correct answer: B
Explanation
The correct answer is B because it ensures that the exact same transformation logic used during training is reused during inference, maintaining consistency. Option A only checks for input format without ensuring consistent processing logic. Option C involves sharing the code but does not ensure it is used in the endpoint. Option D complicates the inference process by introducing batching, which may not be suitable for real-time applications.