You architect a system to analyze seismic data. Your extract, transform, and load (ETL) p…

Question

You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?

Accepted Answer

Correct answer: B. B. Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this. — The correct answer is B because introducing a new MapReduce job specifically for sensor calibration ensures that this crucial step is performed on the raw data before any further processing occurs. Modifying the existing transformMapReduce jobs (option A) could disrupt the current workflow, while options C and D do not adequately integrate sensor calibration into the ETL process, leaving it to users or relying on predictions instead of systematic application.

Google Cloud Professional Data Engineer — Question 82

Answer options

Correct answer: B

Explanation