Google Cloud Associate Data Practitioner — Question 34
You are predicting customer churn for a subscription-based service. You have a 50 PB historical customer dataset in BigQuery that includes demographics, subscription information, and engagement metrics. You want to build a churn prediction model with minimal overhead. You want to follow the Google-recommended approach. What should you do?
Answer options
- A. Export the data from BigQuery to a local machine. Use scikit-learn in a Jupyter notebook to build the churn prediction model.
- B. Use Dataproc to create a Spark cluster. Use the Spark MLlib within the cluster to build the churn prediction model.
- C. Create a Looker dashboard that is connected to BigQuery. Use LookML to predict churn.
- D. Use the BigQuery Python client library in a Jupyter notebook to query and preprocess the data in BigQuery. Use the CREATE MODEL statement in BigQueryML to train the churn prediction model.
Correct answer: D
Explanation
The correct answer is D because it leverages BigQuery's built-in capabilities for model training with minimal overhead, allowing for efficient data handling and processing. Options A and B involve exporting data or setting up additional infrastructure, which increases complexity and overhead. Option C does not provide a direct method for predictive modeling but rather focuses on visualization.