AWS Certified Generative AI – Professional (AIP-C01) — Question 25

A company is developing a generative AI (GenAI) application that analyzes customer service calls in real-time and generates suggested responses for human customer service agents. The application must process 500,000 concurrent calls during peak hours with less than 200 ms end-to-end latency for each suggestion. The company uses existing architecture to transcribe customer call audio streams. The application must not exceed a pre-defined monthly compute budget and must maintain auto scaling capabilities.
Which solution will meet these requirements?

Answer options

Correct answer: B

Explanation

The correct answer is B because it focuses on a low-latency, real-time optimized model that can handle the required volume of calls with the necessary performance. Options A, C, and D either do not meet the latency requirements or are not designed for the real-time processing needed for 500,000 concurrent calls.