AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 206
An ML engineer is using an Amazon SageMaker Studio notebook to train a neural network by creating an estimator. The estimator runs a Python training script that uses Distributed Data Parallel (DDP) on a single instance that has more than one GPU.
The ML engineer discovers that the training script is underutilizing GPU resources. The ML engineer must identify the point in the training script where resource utilization can be optimized.
Which solution will meet this requirement?
Answer options
- A. Use Amazon CloudWatch metrics to create a report that describes GPU utilization over time.
- B. Add SageMaker Profiler annotations to the training script. Run the script and generate a report from the results.
- C. Use AWS CloudTrail to create a report that describes GPU utilization and GPU memory utilization over time.
- D. Create a default monitor in Amazon SageMaker Model Monitor and suggest a baseline. Generate a report based on the constraints and statistics the monitor generates.
Correct answer: B
Explanation
The correct answer is B because adding SageMaker Profiler annotations allows the engineer to analyze the performance bottlenecks in the training script, leading to better resource utilization insights. Option A only provides historical metrics without actionable insights, while option C focuses on logging rather than profiling the code. Option D is related to monitoring model performance rather than optimizing training resources directly.