AWS Certified AI Practitioner (AIF-C01) — Question 219

A company has set up a translation tool to help its customer service team handle issues from customers around the world. The company wants to evaluate the performance of the translation tool. The company sets up a parallel data process that compares the responses from the tool to responses from actual humans. Both sets of responses are generated on the same set of documents.

Which strategy should the company use to evaluate the translation tool?

Answer options

Correct answer: B

Explanation

The correct answer is B because the Bilingual Evaluation Understudy (BLEU) score is designed to compare the translation quality of two methods relative to each other rather than providing an absolute quality measure. Options A and C incorrectly suggest evaluating absolute quality, while D incorrectly applies the BERTScore, which is not the appropriate method for this relative comparison.