AWS Certified Machine Learning – Specialty — Question 100

A machine learning specialist is running an Amazon SageMaker endpoint using the built-in object detection algorithm on a P3 instance for real-time predictions in a company's production application. When evaluating the model's resource utilization, the specialist notices that the model is using only a fraction of the GPU.
Which architecture changes would ensure that provisioned resources are being utilized effectively?

Answer options

Correct answer: B

Explanation

Option B is correct because attaching Amazon Elastic Inference to an M5 instance can enhance GPU utilization by providing additional inference acceleration without the need for a more expensive instance. The other options either do not utilize the GPU effectively (A, C, D) or switch to an instance type that may not leverage the benefits of Elastic Inference.