Google Cloud Professional Machine Learning Engineer — Question 177

You have been tasked with deploying prototype code to production. The feature engineering code is in PySpark and runs on Dataproc Serverless. The model training is executed by using a Vertex AI custom training job. The two steps are not connected, and the model training must currently be run manually after the feature engineering step finishes. You need to create a scalable and maintainable production process that runs end-to-end and tracks the connections between steps. What should you do?

Answer options

Correct answer: C

Explanation

The correct answer is C because using the Kubeflow pipelines SDK allows you to create a structured and maintainable workflow that integrates both the feature engineering and model training steps as components, ensuring they are connected and can be monitored. Options A and B do not provide a scalable solution as they rely on manual execution in a notebook, which is less maintainable. Option D, while similar to C, incorrectly uses an Apache Spark context instead of leveraging Dataproc Serverless, which is more suitable for this scenario.