AWS Certified AI Practitioner (AIF-C01) — Question 193

Which metric is used to evaluate the performance of foundation models (FMs) for text summarization tasks?

Answer options

Correct answer: B

Explanation

The Bilingual Evaluation Understudy (BLEU) score is specifically designed to evaluate the quality of text generated by models by comparing it to reference summaries, making it the appropriate choice for text summarization tasks. The F1 score, accuracy, and mean squared error (MSE) are not tailored for this particular application and do not effectively measure the performance of text summarization models.