AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 53
A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.
How should the company deploy the model on Amazon SageMaker to meet these requirements?
Answer options
- A. Use a multi-model serverless endpoint. Enable caching.
- B. Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.
- C. Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.
- D. Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.
Correct answer: D
Explanation
Option D is the correct choice as a serverless inference endpoint with MaxConcurrency set to 1 allows the model to run efficiently during its nightly execution without incurring costs when not in use. Option A may not be ideal for one-time nightly use, option B is unsuitable because it doesn't allow immediate access to the model, and option C involves unnecessary resources since the model only runs once a day.