Databricks Certified Generative AI Engineer Associate — Question 74
A Generative AI Engineer who was prototyping an LLM system accidentally ran thousands of inference queries against a Foundation Model endpoint over the weekend. They want to take action to prevent this from unintentionally happening again in the future.
What action should they take?
Answer options
- A. Use prompt engineering to instruct the LLM endpoints to refuse too many subsequent queries.
- B. Require that all development code which interfaces with a Foundation Model endpoint must be reviewed by a Staff level engineer before execution.
- C. Build a pyfunc model which proxies to the Foundation Model endpoint and add throttling within the pyfune model.
- D. Configure rate limiting on the Foundation Model endpoints.
Correct answer: D
Explanation
The correct answer is D because configuring rate limiting on the Foundation Model endpoints directly controls the number of requests that can be processed in a given timeframe, thus preventing overload. The other options do not provide a robust, systematic solution to manage and prevent excessive requests, making them less effective in addressing the issue.