AWS Certified AI Practitioner (AIF-C01) — Question 100

A company has fine-tuned a large language model (LLM) to answer questions for a help desk. The company wants to determine if the fine-tuning has enhanced the model's accuracy.

Which metric should the company use for the evaluation?

Answer options

Correct answer: C

Explanation

The F1 score is the most suitable metric for evaluating the accuracy of the model as it considers both precision and recall, providing a balance between the two. While precision measures the correctness of positive predictions and word error rate assesses transcription errors, they do not offer a comprehensive view of the model's overall accuracy. Time to first token, on the other hand, relates to performance speed rather than accuracy.