A Generative AI Engineer developed an LLM application using the pay-per-token Foundation…

Question

A Generative AI Engineer developed an LLM application using the pay-per-token Foundation Model API. Now that the application is ready to be deployed, they would like to ensure the model endpoint can serve high incoming volumes of requests in production. What should the Generative AI Engineer consider?

Accepted Answer

Correct answer: D. D. Deploy the model using provisioned throughput as it comes with performance guarantees — The correct answer is D because deploying the model with provisioned throughput ensures it can handle high volumes of requests effectively, as it comes with performance guarantees. Options A and C do not address the need for handling increased request volumes, while B could lead to delays and does not provide a scalable solution.

Databricks Certified Generative AI Engineer Associate — Question 76

Answer options

Correct answer: D

Explanation