Google Cloud Professional Machine Learning Engineer — Question 334
You are developing a training pipeline for a new XGBoost classification model based on tabular data. The data is stored in a BigQuery table. You need to complete the following steps:
1. Randomly split the data into training and evaluation datasets in a 65/35 ratio
2. Conduct feature engineering
3. Obtain metrics for the evaluation dataset
4. Compare models trained in different pipeline executions
How should you execute these steps?
Answer options
- A. 1. Using Vertex AI Pipelines, add a component to divide the data into training and evaluation sets, and add another component for feature engineering. 2. Enable autologging of metrics in the training component. 3. Compare pipeline runs in Vertex AI Experiments.
- B. 1. Using Vertex AI Pipelines, add a component to divide the data into training and evaluation sets, and add another component for feature engineering. 2. Enable autologging of metrics in the training component. 3. Compare models using the artifacts’ lineage in Vertex ML Metadata.
- C. 1. In BigQuery ML, use the CREATE MODEL statement with BOOSTED_TREE_CLASSIFIER as the model type and use BigQuery to handle the data splits. 2. Use a SQL view to apply feature engineering and train the model using the data in that view. 3. Compare the evaluation metrics of the models by using a SQL query with the ML.TRAINING_INFO statement.
- D. 1. In BigQuery ML, use the CREATE MODEL statement with BOOSTED_TREE_CLASSIFIER as the model type and use BigQuery to handle the data splits. 2. Use ML TRANSFORM to specify the feature engineering transformations and tram the model using the data in the table. 3. Compare the evaluation metrics of the models by using a SQL query with the ML.TRAINING_INFO statement.
Correct answer: A
Explanation
Option A is correct because it outlines the appropriate use of Vertex AI Pipelines to separate the data and perform feature engineering, along with enabling autologging for metrics and comparing pipeline runs effectively. Options B and C, while viable, do not address the comparative analysis as effectively as A. Option D also lacks the necessary components for pipeline execution in the context described.