Which metric is used to evaluate the performance of foundation models (FMs) for text summ…

Question

Which metric is used to evaluate the performance of foundation models (FMs) for text summarization tasks?

Accepted Answer

Correct answer: B. B. Bilingual Evaluation Understudy (BLEU) score — The Bilingual Evaluation Understudy (BLEU) score is specifically designed to evaluate the quality of text generated by models by comparing it to reference summaries, making it the appropriate choice for text summarization tasks. The F1 score, accuracy, and mean squared error (MSE) are not tailored for this particular application and do not effectively measure the performance of text summarization models.

AWS Certified AI Practitioner (AIF-C01) — Question 193

Answer options

Correct answer: B

Explanation