AWS Certified AI Practitioner (AIF-C01) — Question 219
A company has set up a translation tool to help its customer service team handle issues from customers around the world. The company wants to evaluate the performance of the translation tool. The company sets up a parallel data process that compares the responses from the tool to responses from actual humans. Both sets of responses are generated on the same set of documents.
Which strategy should the company use to evaluate the translation tool?
Answer options
- A. Use the Bilingual Evaluation Understudy (BLEU) score to estimate the absolute translation quality of the two methods.
- B. Use the Bilingual Evaluation Understudy (BLEU) score to estimate the relative translation quality of the two methods.
- C. Use the BERTScore to estimate the absolute translation quality of the two methods.
- D. Use the BERTScore to estimate the relative translation quality of the two methods.
Correct answer: B
Explanation
The correct answer is B because the Bilingual Evaluation Understudy (BLEU) score is designed to compare the translation quality of two methods relative to each other rather than providing an absolute quality measure. Options A and C incorrectly suggest evaluating absolute quality, while D incorrectly applies the BERTScore, which is not the appropriate method for this relative comparison.