Databricks Certified Generative AI Engineer Associate — Question 25
A Generative AI Engineer has created a RAG application which can help employees retrieve answers from an internal knowledge base, such as Confluence pages or Google Drive. The prototype application is now working with some positive feedback from internal company testers. Now the Generative Al Engineer wants to formally evaluate the system’s performance and understand where to focus their efforts to further improve the system.
How should the Generative AI Engineer evaluate the system?
Answer options
- A. Use cosine similarity score to comprehensively evaluate the quality of the final generated answers.
- B. Curate a dataset that can test the retrieval and generation components of the system separately. Use MLflow’s built in evaluation metrics to perform the evaluation on the retrieval and generation components.
- C. Benchmark multiple LLMs with the same data and pick the best LLM for the job.
- D. Use an LLM-as-a-judge to evaluate the quality of the final answers generated.
Correct answer: B
Explanation
Option B is correct because it suggests evaluating the retrieval and generation components separately, allowing for a more focused improvement process using MLflow’s metrics. Option A fails to address the need for component-specific evaluation, while Option C focuses on LLM comparison rather than the system itself. Option D, although it involves evaluation, does not provide the structured approach needed to identify component weaknesses.