Google Cloud Professional Machine Learning Engineer — Question 311
You need to train an XGBoost model on a small dataset. Your training code requires custom dependencies. You need to set up a Vertex AI custom training job. You want to minimize the startup time of the training job while following Google-recommended practices. What should you do?
Answer options
- A. Create a custom container that includes the data and the custom dependencies. In your training application, load the data into a pandas DataFrame and train the model.
- B. Store the data in a Cloud Storage bucket, and use the XGBoost prebuilt custom container to run your training application. Create a Python source distribution that installs the custom dependencies at runtime. In your training application, read the data from Cloud Storage and train the model.
- C. Use the XGBoost prebuilt custom container. Create a Python source distribution that includes the data and installs the custom dependencies at runtime. In your training application, load the data into a pandas DataFrame and train the model.
- D. Store the data in a Cloud Storage bucket, and create a custom container with your training application and its custom dependencies. In your training application, read the data from Cloud Storage and train the model.
Correct answer: D
Explanation
The correct answer is D because it allows for the separation of data storage in Cloud Storage while packaging the training application and its dependencies in a custom container, which is efficient and aligns with Google’s best practices. Options A and C incorrectly package the data with the container, which can increase startup time, while option B does not fully leverage a custom container for both the application and dependencies.