AWS Certified Machine Learning – Specialty — Question 300
A company builds computer-vision models that use deep learning for the autonomous vehicle industry. A machine learning (ML) specialist uses an Amazon EC2 instance that has a CPU:GPU ratio of 12:1 to train the models.
The ML specialist examines the instance metric logs and notices that the GPU is idle half of the time. The ML specialist must reduce training costs without increasing the duration of the training jobs.
Which solution will meet these requirements?
Answer options
- A. Switch to an instance type that has only CPUs.
- B. Use a heterogeneous cluster that has two different instances groups.
- C. Use memory-optimized EC2 Spot Instances for the training jobs.
- D. Switch to an instance type that has a CPU:GPU ratio of 6:1.
Correct answer: D
Explanation
The GPU being idle 50% of the time indicates that the GPU is over-provisioned relative to the CPU's ability to feed it data, or the workload is CPU-bound during preprocessing. Switching to an instance type with a 6:1 CPU:GPU ratio optimizes resource utilization by balancing the CPU and GPU bottleneck, thereby reducing costs without extending training duration. Other options, such as using CPU-only instances, would severely degrade deep learning training performance and increase job duration.