A company has fine-tuned a large language model (LLM) to answer questions for a help desk…

Question

A company has fine-tuned a large language model (LLM) to answer questions for a help desk. The company wants to determine if the fine-tuning has enhanced the model's accuracy. Which metric should the company use for the evaluation?

Accepted Answer

Correct answer: C. C. F1 score — The F1 score is the most suitable metric for evaluating the accuracy of the model as it considers both precision and recall, providing a balance between the two. While precision measures the correctness of positive predictions and word error rate assesses transcription errors, they do not offer a comprehensive view of the model's overall accuracy. Time to first token, on the other hand, relates to performance speed rather than accuracy.

AWS Certified AI Practitioner (AIF-C01) — Question 100

Answer options

Correct answer: C

Explanation