Google Cloud Professional Data Engineer — Question 222

You have an upstream process that writes data to Cloud Storage. This data is then read by an Apache Spark job that runs on Dataproc. These jobs are run in the us-central1 region, but the data could be stored anywhere in the United States. You need to have a recovery process in place in case of a catastrophic single region failure. You need an approach with a maximum of 15 minutes of data loss (RPO=15 mins). You want to ensure that there is minimal latency when reading the data. What should you do?

Answer options

Correct answer: D

Explanation

Option D is correct because it uses a dual-region bucket with turbo replication, ensuring data is available in both regions while maintaining low latency by reading from the local bucket. Options A and C involve using a separate bucket for recovery, which could introduce latency issues, while option B relies on a multi-region bucket but does not optimize for the required RPO and recovery efficiency.