You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop c…

Question

You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?

Accepted Answer

Correct answer: A. A. PigLatin using Pig — The correct answer is A, as PigLatin provides a high-level scripting language specifically designed for processing large datasets on Hadoop, with built-in support for checkpointing and pipeline splitting. Options B, C, and D are less suitable for this specific task since HiveQL is more focused on data querying, and both Java and Python using MapReduce require more complex coding without the same level of abstraction for these features.

Google Cloud Professional Data Engineer — Question 91

Answer options

Correct answer: A

Explanation