AWS Certified AI Practitioner (AIF-C01) — Question 101

A company has developed a generative text summarization model by using Amazon Bedrock. The company will use Amazon Bedrock automatic model evaluation capabilities.

Which metric should the company use to evaluate the accuracy of the model?

Answer options

Correct answer: C

Explanation

The BERTScore is specifically designed for evaluating the quality of generated text by comparing it to reference texts based on BERT embeddings, making it ideal for summarization tasks. In contrast, the Area Under the ROC Curve (AUC) score and F1 score are more appropriate for classification tasks, while the Real World Knowledge (RWK) score does not provide a standard measure of accuracy for generative models.