You work with a data engineering team that has developed a pipeline to clean your dataset…

Question

You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to automatically run a Kubeflow
Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?

Accepted Answer

Correct answer: C. C. Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster. — The correct answer is C because it leverages Cloud Storage triggers to automatically notify a Pub/Sub topic when new data is available, allowing for immediate execution of the training job through a Cloud Function. Options A and B require continuous polling or manual triggers, which are less efficient and could lead to delays in model updates. Option D relies on scheduled checks, which may miss immediate updates and is not as responsive as the trigger-based approach.

Google Cloud Professional Machine Learning Engineer — Question 41

Answer options

Correct answer: C

Explanation