Google Cloud Professional Machine Learning Engineer — Question 283

You are tasked with building an MLOps pipeline to retrain tree-based models in production. The pipeline will include components related to data ingestion, data processing, model training, model evaluation, and model deployment. Your organization primarily uses PySpark-based workloads for data preprocessing. You want to minimize infrastructure management effort. How should you set up the pipeline?

Answer options

Correct answer: B

Explanation

The correct answer is B because Vertex AI Pipelines provides a streamlined way to orchestrate MLOps workflows, and using the predefined Dataproc component simplifies the integration with PySpark workloads, minimizing infrastructure management. Option A is incorrect as creating a custom component adds unnecessary complexity. Option C involves using Kubeflow, which may require more management than Vertex AI. Option D suggests using Cloud Composer, which is not as optimized for this specific use case as Vertex AI Pipelines.