AWS Certified AI Practitioner (AIF-C01) — Question 82

A company has built a solution by using generative AI. The solution uses large language models (LLMs) to translate training manuals from English into other languages. The company wants to evaluate the accuracy of the solution by examining the text generated for the manuals.
Which model evaluation strategy meets these requirements?

Answer options

Correct answer: A

Explanation

The Bilingual Evaluation Understudy (BLEU) score is specifically designed to evaluate the quality of text that has been translated from one language to another, making it the most suitable choice for this scenario. The other options, such as RMSE, ROUGE, and F1 score, are not tailored for translation quality assessment and focus on different aspects of model performance.