AWS Certified AI Practitioner (AIF-C01) — Question 328
A company is evaluating several large language models (LLMs) for a text summarization task. The company needs to select a metric to evaluate the quality of the summaries that the LLMs generate.
Which metric will meet this requirement?
Answer options
- A. Recall
- B. Area under the ROC curve (AUC)
- C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
- D. Mean squared error (MSE)
Correct answer: C
Explanation
Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a set of metrics specifically designed to evaluate automatic summarization and machine translation by comparing generated text to reference summaries. In contrast, Recall and Area under the ROC curve (AUC) are standard metrics for classification models, while Mean squared error (MSE) is used to measure the performance of regression models. Therefore, ROUGE is the correct metric for assessing the quality of LLM-generated summaries.