Google Cloud Professional Data Engineer — Question 326
You work for an advertising company, and you've developed a Spark ML model to predict click-through rates at advertisement blocks. You've been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be closing soon, so a rapid lift-and-shift migration is necessary. However, the data you've been using will be migrated to migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?
Answer options
- A. Use Vertex AI for training existing Spark ML models
- B. Rewrite your models on TensorFlow, and start using Vertex AI
- C. Use Dataproc for training existing Spark ML models, but start reading data directly from BigQuery
- D. Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery
Correct answer: C
Explanation
The correct answer, C, allows you to use Dataproc, which is designed for running Spark jobs on Google Cloud, and enables you to access data directly from BigQuery efficiently. Option A is incorrect because Vertex AI is not specifically designed for Spark ML models. Option B requires a complete rewrite of models, which is not necessary. Option D involves more overhead by spinning up a Spark cluster on Compute Engine, which is less efficient than using Dataproc.