Google Cloud Professional Machine Learning Engineer — Question 334

You are developing a training pipeline for a new XGBoost classification model based on tabular data. The data is stored in a BigQuery table. You need to complete the following steps:

1. Randomly split the data into training and evaluation datasets in a 65/35 ratio
2. Conduct feature engineering
3. Obtain metrics for the evaluation dataset
4. Compare models trained in different pipeline executions

How should you execute these steps?

Answer options

Correct answer: A

Explanation

Option A is correct because it outlines the appropriate use of Vertex AI Pipelines to separate the data and perform feature engineering, along with enabling autologging for metrics and comparing pipeline runs effectively. Options B and C, while viable, do not address the comparative analysis as effectively as A. Option D also lacks the necessary components for pipeline execution in the context described.