AWS Certified Solutions Architect – Associate (SAA-C03) — Question 614

A company has a large data workload that runs for 6 hours each day. The company cannot lose any data while the process is running. A solutions architect is designing an Amazon EMR cluster configuration to support this critical data workload.

Which solution will meet these requirements MOST cost-effectively?

Answer options

Correct answer: B

Explanation

A transient Amazon EMR cluster is the most cost-effective choice because the workload only runs for 6 hours a day, meaning a long-running cluster would incur unnecessary costs for the remaining 18 hours. To prevent data loss, the primary and core nodes must run on On-Demand Instances because core nodes host the HDFS data and their termination would lead to data loss. Task nodes do not store data, making them ideal for cost-saving Spot Instances since their termination will not impact data integrity.