AWS Certified Machine Learning – Specialty — Question 202

An ecommerce company wants to train a large image classification model with 10,000 classes. The company runs multiple model training iterations and needs to minimize operational overhead and cost. The company also needs to avoid loss of work and model retraining.

Which solution will meet these requirements?

Answer options

Correct answer: D

Explanation

The correct answer is D, as managed spot training in Amazon SageMaker allows for checkpointing, which ensures that work is not lost and eliminates the need for retraining. Option A may reduce costs but lacks the checkpointing feature. Option B, while it uses Spot Instances, also does not provide a robust solution for avoiding loss of work. Option C is not suitable for large models and does not address the need to maintain job state.