AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 36
An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.
Which instance purchasing option will meet these requirements MOST cost-effectively?
Answer options
- A. Run the primary node, core nodes, and task nodes on On-Demand Instances.
- B. Run the primary node, core nodes, and task nodes on Spot Instances.
- C. Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.
- D. Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.
Correct answer: D
Explanation
Option D is correct because it ensures that the primary and core nodes, which are critical for processing, are stable and do not risk data loss, while still using cost-effective Spot Instances for the less critical task nodes. Options A and B do not optimize for cost-effectiveness and could lead to data loss, respectively. Option C also risks data loss in the core nodes, which is unacceptable.