Databricks Certified Generative AI Engineer Associate — Question 27

A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are not sufficiently high enough to create their own provisioned throughput endpoint. They want to choose a strategy that ensures the best cost-effectiveness for their application.
What strategy should the Generative AI Engineer use?

Answer options

Correct answer: B

Explanation

The best choice is B because pay-per-token throughput provides a flexible and cost-effective way to handle lower request volumes without the need for a dedicated endpoint. Option A is incorrect as switching to External Models may not address the cost issues. Option C does not resolve the throughput problem and may compromise model performance. Option D could lead to suboptimal resource utilization and does not address the fundamental issue of cost-effectiveness.