Databricks Certified Generative AI Engineer Associate — Question 74

A Generative AI Engineer who was prototyping an LLM system accidentally ran thousands of inference queries against a Foundation Model endpoint over the weekend. They want to take action to prevent this from unintentionally happening again in the future.

What action should they take?

Answer options

Correct answer: D

Explanation

The correct answer is D because configuring rate limiting on the Foundation Model endpoints directly controls the number of requests that can be processed in a given timeframe, thus preventing overload. The other options do not provide a robust, systematic solution to manage and prevent excessive requests, making them less effective in addressing the issue.