AWS Certified AI Practitioner (AIF-C01) — Question 86
A company is introducing a mobile app that helps users learn foreign languages. The app makes text more coherent by calling a large language model (LLM). The company collected a diverse dataset of text and supplemented the dataset with examples of more readable versions. The company wants the LLM output to resemble the provided examples.
Which metric should the company use to assess whether the LLM meets these requirements?
Answer options
- A. Value of the loss function
- B. Semantic robustness
- C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score
- D. Latency of the text generation
Correct answer: C
Explanation
The correct answer is C, as the ROUGE score is specifically designed to measure the similarity between generated text and reference text, making it suitable for assessing coherence in language generation. Options A, B, and D do not directly evaluate the alignment of the LLM output with the examples of improved readability; A concerns training efficiency, B deals with the model's general capabilities, and D focuses on performance speed.