Google Cloud Associate Data Practitioner — Question 13
You have a Dataproc cluster that performs batch processing on data stored in Cloud Storage. You need to schedule a daily Spark job to generate a report that will be emailed to stakeholders. You need a fully-managed solution that is easy to implement and minimizes complexity. What should you do?
Answer options
- A. Use Cloud Composer to orchestrate the Spark job and email the report.
- B. Use Dataproc workflow templates to define and schedule the Spark job, and to email the report.
- C. Use Cloud Run functions to trigger the Spark job and email the report.
- D. Use Cloud Scheduler to trigger the Spark job, and use Cloud Run functions to email the report.
Correct answer: B
Explanation
The correct answer is B because Dataproc workflow templates provide a built-in way to define, schedule, and manage Spark jobs, along with the capability to automate email reporting. Option A, while using Cloud Composer, adds unnecessary complexity compared to the straightforward approach of using workflow templates. Options C and D involve using Cloud Run functions, which is less efficient for this scenario since the workflow templates are specifically designed for orchestrating jobs in Dataproc.