AWS Certified Data Engineer – Associate (DEA-C01) — Question 119
A company currently uses a provisioned Amazon EMR cluster that includes general purpose Amazon EC2 instances. The EMR cluster uses EMR managed scaling between one to five task nodes for the company’s long-running Apache Spark extract, transform, and load (ETL) job. The company runs the ETL job every day.
When the company runs the ETL job, the EMR cluster quickly scales up to five nodes. The EMR cluster often reaches maximum CPU usage, but the memory usage remains under 30%.
The company wants to modify the EMR cluster configuration to reduce the EMR costs to run the daily ETL job.
Which solution will meet these requirements MOST cost-effectively?
Answer options
- A. Increase the maximum number of task nodes for EMR managed scaling to 10.
- B. Change the task node type from general purpose EC2 instances to memory optimized EC2 instances.
- C. Switch the task node type from general purpose Re instances to compute optimized EC2 instances.
- D. Reduce the scaling cooldown period for the provisioned EMR cluster.
Correct answer: C
Explanation
Switching to compute optimized EC2 instances (option C) is the best choice because these instances are designed to handle high CPU workloads efficiently, which is beneficial given the high CPU usage of the cluster. Increasing the maximum number of task nodes (option A) or switching to memory optimized instances (option B) would not address the CPU bottleneck and could potentially increase costs. Reducing the scaling cooldown period (option D) may lead to inefficiencies and does not directly solve the underlying CPU utilization issue.