Google Cloud Professional Data Engineer — Question 256
You are preparing data to serve a sales demand prediction model. The training data undergoes several pre-processing steps, including scaling numerical features and one-hot encoding categorical features. The model is deployed on Vertex AI Endpoints. You need to prevent training-serving skew and ensure accurate predictions in production. You want a solution that is easy to implement.
What should you do?
Answer options
- A. Implement a custom handler within the Vertex AI Endpoint to automatically perform data transformations before the model makes a prediction.
- B. Replicate the exact same pre-processing logic in the inference pipeline that was used during model training.
- C. Store the raw, unprocessed data in a separate Cloud Storage bucket exclusively for serving.
- D. Ensure the serving data is a smaller, random sample of the training data.
Correct answer: B
Explanation
The correct answer is B because duplicating the pre-processing logic ensures that the data served to the model is consistent with the data used during training, which is critical for making accurate predictions. Option A introduces complexity without guaranteeing consistency, while C and D do not address the need for identical pre-processing, leading to potential skew in predictions.