AWS Certified Data Engineer – Associate (DEA-C01) — Question 119

A company currently uses a provisioned Amazon EMR cluster that includes general purpose Amazon EC2 instances. The EMR cluster uses EMR managed scaling between one to five task nodes for the company’s long-running Apache Spark extract, transform, and load (ETL) job. The company runs the ETL job every day.

When the company runs the ETL job, the EMR cluster quickly scales up to five nodes. The EMR cluster often reaches maximum CPU usage, but the memory usage remains under 30%.

The company wants to modify the EMR cluster configuration to reduce the EMR costs to run the daily ETL job.

Which solution will meet these requirements MOST cost-effectively?

Answer options

Correct answer: C

Explanation

Switching to compute optimized EC2 instances (option C) is the best choice because these instances are designed to handle high CPU workloads efficiently, which is beneficial given the high CPU usage of the cluster. Increasing the maximum number of task nodes (option A) or switching to memory optimized instances (option B) would not address the CPU bottleneck and could potentially increase costs. Reducing the scaling cooldown period (option D) may lead to inefficiencies and does not directly solve the underlying CPU utilization issue.