Google Cloud Associate Data Practitioner — Question 12
Your company’s ecommerce website collects product reviews from customers. The reviews are loaded as CSV files daily to a Cloud Storage bucket. The reviews are in multiple languages and need to be translated to Spanish. You need to configure a pipeline that is serverless, efficient, and requires minimal maintenance. What should you do?
Answer options
- A. Load the data into BigQuery using Dataproc. Use Apache Spark to translate the reviews by invoking the Cloud Translation API. Set BigQuery as the sink.
- B. Use a Dataflow templates pipeline to translate the reviews using the Cloud Translation API. Set BigQuery as the sink.
- C. Load the data into BigQuery using a Cloud Run function. Use the BigQuery ML create model statement to train a translation model. Use the model to translate the product reviews within BigQuery.
- D. Load the data into BigQuery using a Cloud Run function. Create a BigQuery remote function that invokes the Cloud Translation API. Use a scheduled query to translate new reviews.
Correct answer: B
Explanation
Option B is correct because using Dataflow templates with the Cloud Translation API provides a fully managed, serverless solution that efficiently translates data with minimal maintenance. Options A and C involve additional complexities and maintenance, such as managing Dataproc or training a machine learning model, which are not necessary for this straightforward translation task. Option D, while feasible, introduces the complexity of a remote function and scheduled queries, making it less efficient than the Dataflow approach.