A company is introducing a mobile app that helps users learn foreign languages. The app m…

Question

A company is introducing a mobile app that helps users learn foreign languages. The app makes text more coherent by calling a large language model (LLM). The company collected a diverse dataset of text and supplemented the dataset with examples of more readable versions. The company wants the LLM output to resemble the provided examples. Which metric should the company use to assess whether the LLM meets these requirements?

Accepted Answer

Correct answer: C. C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score — The correct answer is C, as the ROUGE score is specifically designed to measure the similarity between generated text and reference text, making it suitable for assessing coherence in language generation. Options A, B, and D do not directly evaluate the alignment of the LLM output with the examples of improved readability; A concerns training efficiency, B deals with the model's general capabilities, and D focuses on performance speed.

AWS Certified AI Practitioner (AIF-C01) — Question 86

Answer options

Correct answer: C

Explanation