AWS Certified Big Data – Specialty — Question 44

An organization uses a custom map reduce application to build monthly reports based on many small data files in an Amazon S3 bucket. The data is submitted from various business units on a frequent but unpredictable schedule. As the dataset continues to grow, it becomes increasingly difficult to process all of the data in one day. The organization has scaled up its Amazon EMR cluster, but other optimizations could improve performance.
The organization needs to improve performance with minimal changes to existing processes and applications.
What action should the organization take?

Answer options

Correct answer: B

Explanation

The correct answer, B, is appropriate as adding Spark to the Amazon EMR cluster allows for in-memory processing of data using Resilient Distributed Datasets, greatly enhancing performance. The other options either involve more complex changes or do not directly improve processing speed for the existing application.