Google Cloud Professional Data Engineer — Question 45
Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use Hadoop jobs they have already created and minimize the management of the cluster as much as possible. They also want to be able to persist data beyond the life of the cluster. What should you do?
Answer options
- A. Create a Google Cloud Dataflow job to process the data.
- B. Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.
- C. Create a Hadoop cluster on Google Compute Engine that uses persistent disks.
- D. Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.
- E. Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.
Correct answer: D
Explanation
The correct answer is D because using a Cloud Dataproc cluster with the Google Cloud Storage connector allows for the reuse of existing Hadoop jobs and ensures data persistence after the cluster is terminated. Options A and C do not meet the requirement of minimizing management or ensuring data persistence effectively, while B does not utilize the Google Cloud Storage connector. Option E uses Local SSDs, which do not provide the same level of persistence as Google Cloud Storage.