Google Cloud Professional Machine Learning Engineer — Question 274
You are using Kubeflow Pipelines to develop an end-to-end PyTorch-based MLOps pipeline. The pipeline reads data from BigQuery, processes the data, conducts feature engineering, model training, model evaluation, and deploys the model as a binary file to Cloud Storage. You are writing code for several different versions of the feature engineering and model training steps, and running each new version in Vertex AI Pipelines. Each pipeline run is taking over an hour to complete. You want to speed up the pipeline execution to reduce your development time, and you want to avoid additional costs. What should you do?
Answer options
- A. Comment out the part of the pipeline that you are not currently updating.
- B. Enable caching in all the steps of the Kubeflow pipeline.
- C. Delegate feature engineering to BigQuery and remove it from the pipeline.
- D. Add a GPU to the model training step.
Correct answer: B
Explanation
Enabling caching in all steps of the Kubeflow pipeline allows the system to reuse the outputs of previous runs for unchanged steps, significantly reducing execution time for subsequent runs. Commenting out parts of the pipeline does not enhance performance and might hinder testing. Offloading feature engineering to BigQuery can simplify the pipeline but may not directly address execution speed. Adding a GPU can improve training speed but may incur additional costs, which you want to avoid.