AWS Certified AI Practitioner (AIF-C01) — Question 228

An education company is building a chatbot whose target audience is teenagers. The company is training a custom large language model (LLM). The company wants the chatbot to speak in the target audience's language style by using creative spelling and shortened words.

Which metric will assess the LLM's performance?

Answer options

Correct answer: B

Explanation

The correct answer is BERTScore, which is particularly suited for evaluating the performance of language models in understanding and generating text that aligns with specific stylistic choices. Other metrics like F1 score and ROUGE are more focused on classification tasks and summarization, while BLEU is primarily used for translation tasks, making them less relevant for this specific context.