Databricks Certified Machine Learning Professional — Question 83
A Machine Learning Engineer needs a continuous deployment pipeline for their models hosted on Databricks Model Serving. The deployment automation should execute after a model is trained and registered using MLflow. The goal of the automation is to deploy the latest version of the model from the MLflow Model Registry only if the model can meet the company’s strict latency requirements (P95 < 300ms) while serving production traffic.
How can the engineer validate that new models meet their latency requirements when served in production?
Answer options
- A. A/B test the latest model with Databricks Model Serving so that the latest model receives 5% of production traffic and the current model receives the rest. Use inference tables to calculate P95 latency and verify it is less than 300ms.
- B. Serve the latest model on Databricks Model Serving and use a load testing client to generate requests at the production request rate. Use the load testing client to calculate P95 latency and verify it is less than 300ms.
- C. Use the MLflow Get Run API to retrieve the model metrics from MLflow Tracking and verify that the model_latency metric is less than 300ms.
- D. Use the MLflow Get Run API to retrieve the model metrics from MLflow Tracking and verify that the inference_latency metric is less than 300ms.
Correct answer: A
Explanation
Option A is the correct answer because A/B testing allows for real-time comparisons between the new and existing models under actual production conditions, ensuring the new model meets latency requirements. Option B, while useful, does not provide a comparison with the current model in production. Options C and D focus on retrieving metrics from MLflow without actual production testing, which does not guarantee the model meets latency criteria in a live environment.