AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 97

A company plans to deploy an ML model for production inference on an Amazon SageMaker endpoint. The average inference payload size will vary from 100 MB to 300 MB. Inference requests must be processed in 60 minutes or less.

Which SageMaker inference option will meet these requirements?

Answer options

Correct answer: B

Explanation

Asynchronous inference is suitable for handling larger payloads and allows for processing that can exceed typical response time limits, making it ideal for the given requirements. Serverless inference and real-time inference are not designed for extended processing times and are more suited for smaller payloads. Batch transform is not appropriate as it processes batches of data rather than individual inference requests within a set time frame.