AWS Certified AI Practitioner (AIF-C01) — Question 84
A social media company wants to use a large language model (LLM) to summarize messages. The company has chosen a few LLMs that are available on Amazon SageMaker JumpStart. The company wants to compare the generated output toxicity of these models.
Which strategy gives the company the ability to evaluate the LLMs with the LEAST operational overhead?
Answer options
- A. Crowd-sourced evaluation
- B. Automatic model evaluation
- C. Model evaluation with human workers
- D. Reinforcement learning from human feedback (RLHF)
Correct answer: B
Explanation
The correct answer is B, Automatic model evaluation, as it allows for quick and efficient assessment of the LLM outputs with minimal human intervention. Options A and C involve human input, which increases operational overhead, while D, Reinforcement learning from human feedback (RLHF), requires extensive setup and ongoing management, making it less practical for quick evaluations.