AWS Certified AI Practitioner (AIF-C01) — Question 193
Which metric is used to evaluate the performance of foundation models (FMs) for text summarization tasks?
Answer options
- A. F1 score
- B. Bilingual Evaluation Understudy (BLEU) score
- C. Accuracy
- D. Mean squared error (MSE)
Correct answer: B
Explanation
The Bilingual Evaluation Understudy (BLEU) score is specifically designed to evaluate the quality of text generated by models by comparing it to reference summaries, making it the appropriate choice for text summarization tasks. The F1 score, accuracy, and mean squared error (MSE) are not tailored for this particular application and do not effectively measure the performance of text summarization models.