AWS Certified AI Practitioner (AIF-C01) — Question 100
A company has fine-tuned a large language model (LLM) to answer questions for a help desk. The company wants to determine if the fine-tuning has enhanced the model's accuracy.
Which metric should the company use for the evaluation?
Answer options
- A. Precision
- B. Time to first token
- C. F1 score
- D. Word error rate
Correct answer: C
Explanation
The F1 score is the most suitable metric for evaluating the accuracy of the model as it considers both precision and recall, providing a balance between the two. While precision measures the correctness of positive predictions and word error rate assesses transcription errors, they do not offer a comprehensive view of the model's overall accuracy. Time to first token, on the other hand, relates to performance speed rather than accuracy.