Google Cloud Professional Data Engineer — Question 239
You want to migrate an Apache Spark 3 batch job from on-premises to Google Cloud. You need to minimally change the job so that the job reads from Cloud Storage and writes the result to BigQuery. Your job is optimized for Spark, where each executor has 8 vCPU and 16 GB memory, and you want to be able to choose similar settings. You want to minimize installation and management effort to run your job. What should you do?
Answer options
- A. Execute the job as part of a deployment in a new Google Kubernetes Engine cluster.
- B. Execute the job from a new Compute Engine VM.
- C. Execute the job in a new Dataproc cluster.
- D. Execute as a Dataproc Serverless job.
Correct answer: D
Explanation
The correct answer is D, as Dataproc Serverless allows you to run Spark jobs without the overhead of managing cluster infrastructure, perfectly aligning with the goal of minimizing management effort. Options A and B require more setup and management of resources, while option C involves creating a cluster that still necessitates some management of the underlying infrastructure.