AWS Certified Machine Learning – Specialty — Question 280
A company is building custom deep learning models in Amazon SageMaker by using training and inference containers that run on Amazon EC2 instances. The company wants to reduce training costs but does not want to change the current architecture. The SageMaker training job can finish after interruptions. The company can wait days for the results.
Which combination of resources should the company use to meet these requirements MOST cost-effectively? (Choose two.)
Answer options
- A. On-Demand Instances
- B. Checkpoints
- C. Reserved Instances
- D. Incremental training
- E. Spot instances
Correct answer: B, E
Explanation
Using Spot Instances provides substantial cost discounts for SageMaker training workloads that can tolerate interruptions and delays. To ensure that training progress is not lost when a Spot Instance is reclaimed, Checkpoints must be enabled to periodically save the model's state and resume training from the last saved point.