Google Cloud Professional Data Engineer — Question 104

You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC).
All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster's local Hadoop Distributed File System
(HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)

Answer options

Correct answer: A, D

Explanation

Option A is correct because it suggests transferring the ORC files directly to HDFS, which is essential for Hive performance. Option D is also correct as it leverages the Cloud Storage connector to mount ORC files as external Hive tables, allowing efficient data access. The other options either do not prioritize HDFS or suggest unnecessary steps that complicate the process.