Google Cloud Professional Machine Learning Engineer — Question 285
You are developing a TensorFlow Extended (TFX) pipeline with standard TFX components. The pipeline includes data preprocessing steps. After the pipeline is deployed to production, it will process up to 100 TB of data stored in BigQuery. You need the data preprocessing steps to scale efficiently, publish metrics and parameters to Vertex AI Experiments, and track artifacts by using Vertex ML Metadata. How should you configure the pipeline run?
Answer options
- A. Run the TFX pipeline in Vertex AI Pipelines. Configure the pipeline to use Vertex AI Training jobs with distributed processing.
- B. Run the TFX pipeline in Vertex AI Pipelines. Set the appropriate Apache Beam parameters in the pipeline to run the data preprocessing steps in Dataflow.
- C. Run the TFX pipeline in Dataproc by using the Apache Beam TFX orchestrator. Set the appropriate Vertex AI permissions in the job to publish metadata in Vertex AI.
- D. Run the TFX pipeline in Dataflow by using the Apache Beam TFX orchestrator. Set the appropriate Vertex AI permissions in the job to publish metadata in Vertex AI.
Correct answer: B
Explanation
The correct answer is B because running the TFX pipeline in Vertex AI Pipelines with the right Apache Beam parameters allows for efficient scaling and processing using Dataflow, which is optimized for large datasets. Options A and C do not utilize Dataflow for data preprocessing, and while D uses Dataflow, it does not align with the requirement of running the pipeline in Vertex AI Pipelines.