Google Cloud Professional Machine Learning Engineer — Question 283
You are tasked with building an MLOps pipeline to retrain tree-based models in production. The pipeline will include components related to data ingestion, data processing, model training, model evaluation, and model deployment. Your organization primarily uses PySpark-based workloads for data preprocessing. You want to minimize infrastructure management effort. How should you set up the pipeline?
Answer options
- A. Set up a TensorFlow Extended (TFX) pipeline on Vertex AI Pipelines to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.
- B. Set up a Vertex AI Pipelines to orchestrate the MLOps pipeline. Use the predefined Dataproc component for the PySpark-based workloads.
- C. Set up Kubeflow Pipelines on Google Kubernetes Engine to orchestrate the MLOps pipeline. Write a custom component for the PySparkbased workloads on Dataproc.
- D. Set up Cloud Composer to orchestrate the MLOps pipeline. Use Dataproc workflow templates for the PySpark-based workloads in Cloud Composer.
Correct answer: B
Explanation
The correct answer is B because Vertex AI Pipelines provides a streamlined way to orchestrate MLOps workflows, and using the predefined Dataproc component simplifies the integration with PySpark workloads, minimizing infrastructure management. Option A is incorrect as creating a custom component adds unnecessary complexity. Option C involves using Kubeflow, which may require more management than Vertex AI. Option D suggests using Cloud Composer, which is not as optimized for this specific use case as Vertex AI Pipelines.