Google Cloud Professional Data Engineer — Question 164

You need to modernize your existing on-premises data strategy. Your organization currently uses:
• Apache Hadoop clusters for processing multiple large data sets, including on-premises Hadoop Distributed File System (HDFS) for data replication.
• Apache Airflow to orchestrate hundreds of ETL pipelines with thousands of job steps.

You need to set up a new architecture in Google Cloud that can handle your Hadoop workloads and requires minimal changes to your existing orchestration processes. What should you do?

Answer options

Correct answer: B

Explanation

The correct answer is B because Dataproc allows for the seamless migration of Hadoop clusters to Google Cloud with minimal changes to existing workflows, while Cloud Storage can effectively replace HDFS for data storage. The other options suggest using different services like Bigtable or Dataflow, which would require more significant changes to the existing architecture and orchestration processes.