AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 101

An ML engineer has deployed an Amazon SageMaker model to a serverless endpoint in production. The model is invoked by the InvokeEndpoint API operation.

The model's latency in production is higher than the baseline latency in the test environment. The ML engineer thinks that the increase in latency is because of model startup time.

What should the ML engineer do to confirm or deny this hypothesis?

Answer options

Correct answer: C

Explanation

The correct answer is C because monitoring the ModelSetupTime metric helps determine how long the model takes to initialize, which directly relates to the latency issue. Options A and B focus on model quality rather than startup time, and option D measures a different aspect of the model loading process that may not directly confirm the startup time hypothesis.