AWS Certified Generative AI – Professional (AIP-C01) — Question 42

An ecommerce company is using Amazon Bedrock to build a customer service AI assistant. The AI assistant needs to process over 50,000 customer inquiries every day. The AI assistant occasionally experiences traffic spikes of up to 150,000 inquiries every day during promotional events. Analysis shows that 40% of inquiries follow similar patterns that share the same context.
A GenAI developer must design a solution that will ensure low latency and consistent performance for the AI assistant during traffic spikes.
Which solution will meet these requirements MOST cost-effectively?

Answer options

Correct answer: A

Explanation

Option A is correct as it optimally balances performance and cost by configuring latency-optimized inference and using prompt caching for repetitive inquiries, ensuring low latency during traffic spikes. Option B is less cost-effective due to the need for provisioned throughput and model units, while option C introduces unnecessary complexity without directly addressing the latency issue. Option D, while useful for routing, may not be as efficient in handling high volumes of similar inquiries as options A.