AWS Certified AI Practitioner (AIF-C01) — Question 253
A company uses Amazon SageMaker AI to generate article summaries in multiple languages. The company needs a metric to evaluate the quality of the summary translations in multiple languages.
Which evaluation metric will meet these requirements?
Answer options
- A. Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
- B. Bilingual evaluation understudy (BLEU)
- C. Area Under the ROC Curve (AUC)
- D. Precision
Correct answer: B
Explanation
The correct answer is B, Bilingual evaluation understudy (BLEU), as it specifically measures the quality of translated text by comparing it to a reference translation. ROUGE (A) is more suited for evaluating summarization rather than translation quality, while AUC (C) and Precision (D) are metrics used in different contexts, such as classification tasks, and are not appropriate for evaluating translation quality.