Google Cloud Professional Machine Learning Engineer — Question 305
You are building an ML pipeline to process and analyze both steaming and batch datasets. You need the pipeline to handle data validation, preprocessing, model training, and model deployment in a consistent and automated way. You want to design an efficient and scalable solution that captures model training metadata and is easily reproducible. You want to be able to reuse custom components for different parts of your pipeline. What should you do?
Answer options
- A. Use Cloud Composer for distributed processing of batch and streaming data in the pipeline.
- B. Use Dataflow for distributed processing of batch and streaming data in the pipeline.
- C. Use Cloud Build to build and push Docker images for each pipeline component.
- D. Implement an orchestration framework such as Kubeflow Pipelines or Vertex AI Pipelines.
Correct answer: D
Explanation
The correct answer is D because an orchestration framework like Kubeflow Pipelines or Vertex AI Pipelines is designed for managing ML workflows, enabling automation, and ensuring reproducibility. Options A and B focus on data processing but do not provide the orchestration needed for the entire pipeline. Option C is related to building and pushing images but does not address orchestration or the full workflow management required.