AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 44

An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems.
The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training.
Which solution will meet these requirements with the LEAST operational overhead?

Answer options

Correct answer: D

Explanation

The correct answer is D because SageMaker Debugger provides built-in monitoring capabilities specifically designed for training jobs, enabling real-time detection of issues like vanishing gradients and overfitting with minimal setup. Options A, B, and C require additional configuration and do not directly address the monitoring needs as effectively or efficiently as SageMaker Debugger does.