Google Cloud Professional Data Engineer — Question 91
You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?
Answer options
- A. PigLatin using Pig
- B. HiveQL using Hive
- C. Java using MapReduce
- D. Python using MapReduce
Correct answer: A
Explanation
The correct answer is A, as PigLatin provides a high-level scripting language specifically designed for processing large datasets on Hadoop, with built-in support for checkpointing and pipeline splitting. Options B, C, and D are less suitable for this specific task since HiveQL is more focused on data querying, and both Java and Python using MapReduce require more complex coding without the same level of abstraction for these features.