A Generative AI Engineer who was prototyping an LLM system accidentally ran thousands of…

Question

A Generative AI Engineer who was prototyping an LLM system accidentally ran thousands of inference queries against a Foundation Model endpoint over the weekend. They want to take action to prevent this from unintentionally happening again in the future. What action should they take?

Accepted Answer

Correct answer: D. D. Configure rate limiting on the Foundation Model endpoints. — The correct answer is D because configuring rate limiting on the Foundation Model endpoints directly controls the number of requests that can be processed in a given timeframe, thus preventing overload. The other options do not provide a robust, systematic solution to manage and prevent excessive requests, making them less effective in addressing the issue.

Databricks Certified Generative AI Engineer Associate — Question 74

Answer options

Correct answer: D

Explanation