An organization uses Amazon Elastic MapReduce(EMR) to process a series of extract-transfo…

Question

An organization uses Amazon Elastic MapReduce(EMR) to process a series of extract-transform-load (ETL) steps that run in sequence. The output of each step must be fully processed in subsequent steps but will not be retained.
Which of the following techniques will meet this requirement most efficiently?

Accepted Answer

Correct answer: B. B. Use the s3n URI to store the data to be processed as objects in Amazon S3. — The correct answer is B because using the s3n URI allows for efficient storage of intermediate data in Amazon S3 without retaining it, fulfilling the requirement of processing outputs sequentially. Options A and D involve unnecessary complexities of using HDFS and EMRFS, which are not needed since the data will not be retained. Option C introduces an additional layer of management with AWS Data Pipeline, which is also not required for this scenario.

AWS Certified Big Data – Specialty — Question 26

Answer options

Correct answer: B

Explanation