You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary t…

Question

You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC).
All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster's local Hadoop Distributed File System
(HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)

Accepted Answer

Correct answer: A, D. A. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDFS. Mount the Hive tables locally. — D. Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive tables. Replicate external Hive tables to the native ones. — Option A is correct because it suggests transferring the ORC files directly to HDFS, which is essential for Hive performance. Option D is also correct as it leverages the Cloud Storage connector to mount ORC files as external Hive tables, allowing efficient data access. The other options either do not prioritize HDFS or suggest unnecessary steps that complicate the process.

Google Cloud Professional Data Engineer — Question 104

Answer options

Correct answer: A, D

Explanation