Databricks Certified Generative AI Engineer Associate — Question 49
A Generative AI Engineer has created a RAG application which can help employees interpret HR documentation. The prototype application is now working with some positive feedback from internal company testers. Now the Generative AI Engineer wants to formally evaluate the system’s performance and understand where to focus their efforts to further improve the system
How should the Generative AI Engineer evaluate the system?
Answer options
- A. Use ROUGE score to comprehensively evaluate the quality of the final generated answers.
- B. Use an LLM-as-a-judge to evaluate the quality of the final answers generated.
- C. Curate a dataset that can test the retrieval and generation components of the system separately. Use MLflow’s built in evaluation metrics to perform the evaluation on the retrieval and generation components.
- D. Benchmark multiple LLMs with the same data and pick the best LLM for the job.
Correct answer: C
Explanation
Option C is correct as it allows for a focused evaluation of both the retrieval and generation parts of the system separately, ensuring targeted improvements. Option A only measures output quality without isolating aspects, B relies on subjective LLM judgment, and D may not address specific system weaknesses but rather compares different models.