Google Cloud Professional Machine Learning Engineer — Question 226
You are investigating the root cause of a misclassification error made by one of your models. You used Vertex AI Pipelines to train and deploy the model. The pipeline reads data from BigQuery. creates a copy of the data in Cloud Storage in TFRecord format, trains the model in Vertex AI Training on that copy, and deploys the model to a Vertex AI endpoint. You have identified the specific version of that model that misclassified, and you need to recover the data this model was trained on. How should you find that copy of the data?
Answer options
- A. Use Vertex AI Feature Store. Modify the pipeline to use the feature store, and ensure that all training data is stored in it. Search the feature store for the data used for the training.
- B. Use the lineage feature of Vertex AI Metadata to find the model artifact. Determine the version of the model and identify the step that creates the data copy and search in the metadata for its location.
- C. Use the logging features in the Vertex AI endpoint to determine the timestamp of the model’s deployment. Find the pipeline run at that timestamp. Identify the step that creates the data copy, and search in the logs for its location.
- D. Find the job ID in Vertex AI Training corresponding to the training for the model. Search in the logs of that job for the data used for the training.
Correct answer: B
Explanation
The correct answer is B because the lineage feature of Vertex AI Metadata tracks the relationships between model artifacts and their data sources, allowing you to trace back to the specific data used for training. The other options either suggest searching in places where the training data may not be stored (A and C) or rely on logs which may not provide the exact data used (D).