Google Cloud Professional Data Engineer — Question 60
Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for- like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?
Answer options
- A. Put the data into Google Cloud Storage.
- B. Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.
- C. Tune the Cloud Dataproc cluster so that there is just enough disk for all data.
- D. Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.
Correct answer: A
Explanation
The correct choice is A, as Google Cloud Storage is more cost-effective for storing large amounts of data compared to Google Persistent Disk. Options B and C do not address the storage cost issue directly, while D may help but still incurs costs for the Persistent Disk used for hot data.