AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 101
An ML engineer has deployed an Amazon SageMaker model to a serverless endpoint in production. The model is invoked by the InvokeEndpoint API operation.
The model's latency in production is higher than the baseline latency in the test environment. The ML engineer thinks that the increase in latency is because of model startup time.
What should the ML engineer do to confirm or deny this hypothesis?
Answer options
- A. Schedule a SageMaker Model Monitor job. Observe metrics about model quality.
- B. Schedule a SageMaker Model Monitor job with Amazon CloudWatch metrics enabled.
- C. Enable Amazon CloudWatch metrics. Observe the ModelSetupTime metric in the SageMaker namespace.
- D. Enable Amazon CloudWatch metrics. Observe the ModelLoadingWaitTime metric in the SageMaker namespace.
Correct answer: C
Explanation
The correct answer is C because monitoring the ModelSetupTime metric helps determine how long the model takes to initialize, which directly relates to the latency issue. Options A and B focus on model quality rather than startup time, and option D measures a different aspect of the model loading process that may not directly confirm the startup time hypothesis.