Google Cloud Professional Machine Learning Engineer — Question 273
You are creating an ML pipeline for data processing, model training, and model deployment that uses different Google Cloud services. You have developed code for each individual task, and you expect a high frequency of new files. You now need to create an orchestration layer on top of these tasks. You only want this orchestration pipeline to run if new files are present in your dataset in a Cloud Storage bucket. You also want to minimize the compute node costs. What should you do?
Answer options
- A. Create a pipeline in Vertex AI Pipelines. Configure the first step to compare the contents of the bucket to the last time the pipeline was run. Use the scheduler API to run the pipeline periodically.
- B. Create a Cloud Function that uses a Cloud Storage trigger and deploys a Cloud Composer directed acyclic graph (DAG).
- C. Create a pipeline in Vertex AI Pipelines. Create a Cloud Function that uses a Cloud Storage trigger and deploys the pipeline.
- D. Deploy a Cloud Composer directed acyclic graph (DAG) with a GCSObjectUpdateSensor class that detects when a new file is added to the Cloud Storage bucket.
Correct answer: C
Explanation
The correct answer is C because it combines the capabilities of Vertex AI Pipelines and Cloud Functions to trigger the workflow based on new files in Cloud Storage, ensuring efficient orchestration. Option A runs the pipeline on a schedule, which doesn't strictly respond to new files, while option B lacks the direct orchestration of the pipeline itself, and option D, although effective, does not utilize Vertex AI Pipelines for model training and deployment.