Google Cloud Professional Machine Learning Engineer — Question 142
While running a model training pipeline on Vertex Al, you discover that the evaluation step is failing because of an out-of-memory error. You are currently using TensorFlow Model Analysis (TFMA) with a standard Evaluator TensorFlow Extended (TFX) pipeline component for the evaluation step. You want to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead. What should you do?
Answer options
- A. Include the flag -runner=DataflowRunner in beam_pipeline_args to run the evaluation step on Dataflow.
- B. Move the evaluation step out of your pipeline and run it on custom Compute Engine VMs with sufficient memory.
- C. Migrate your pipeline to Kubeflow hosted on Google Kubernetes Engine, and specify the appropriate node parameters for the evaluation step.
- D. Add tfma.MetricsSpec () to limit the number of metrics in the evaluation step.
Correct answer: A
Explanation
The correct option, A, suggests using Dataflow to run the evaluation step, which can better handle memory management and scale resources as needed, thereby preventing out-of-memory errors. Option B may require additional management and does not optimize resource usage as effectively as Dataflow. Option C involves more complexity and could introduce overhead rather than reduce it. Option D may limit metrics but does not address the root cause of the memory issue.