You have been tasked with deploying prototype code to production. The feature engineering…

Question

You have been tasked with deploying prototype code to production. The feature engineering code is in PySpark and runs on Dataproc Serverless. The model training is executed by using a Vertex AI custom training job. The two steps are not connected, and the model training must currently be run manually after the feature engineering step finishes. You need to create a scalable and maintainable production process that runs end-to-end and tracks the connections between steps. What should you do?

Accepted Answer

Correct answer: C. C. Use the Kubeflow pipelines SDK to write code that specifies two components:
- The first is a Dataproc Serverless component that launches the feature engineering job
- The second is a custom component wrapped in the create_custom_training_job_from_component utility that launches the custom model training job
Create a Vertex AI Pipelines job to link and run both components — The correct answer is C because using the Kubeflow pipelines SDK allows you to create a structured and maintainable workflow that integrates both the feature engineering and model training steps as components, ensuring they are connected and can be monitored. Options A and B do not provide a scalable solution as they rely on manual execution in a notebook, which is less maintainable. Option D, while similar to C, incorrectly uses an Apache Spark context instead of leveraging Dataproc Serverless, which is more suitable for this scenario.

Google Cloud Professional Machine Learning Engineer — Question 177

Answer options

Correct answer: C

Explanation