AWS Certified Solutions Architect – Associate (SAA-C03) — Question 614
A company has a large data workload that runs for 6 hours each day. The company cannot lose any data while the process is running. A solutions architect is designing an Amazon EMR cluster configuration to support this critical data workload.
Which solution will meet these requirements MOST cost-effectively?
Answer options
- A. Configure a long-running cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.
- B. Configure a transient cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.
- C. Configure a transient cluster that runs the primary node on an On-Demand Instance and the core nodes and task nodes on Spot Instances.
- D. Configure a long-running cluster that runs the primary node on an On-Demand Instance, the core nodes on Spot Instances, and the task nodes on Spot Instances.
Correct answer: B
Explanation
A transient Amazon EMR cluster is the most cost-effective choice because the workload only runs for 6 hours a day, meaning a long-running cluster would incur unnecessary costs for the remaining 18 hours. To prevent data loss, the primary and core nodes must run on On-Demand Instances because core nodes host the HDFS data and their termination would lead to data loss. Task nodes do not store data, making them ideal for cost-saving Spot Instances since their termination will not impact data integrity.