An ecommerce company is using Amazon Bedrock to build a customer service AI assistant. Th…

Question

An ecommerce company is using Amazon Bedrock to build a customer service AI assistant. The AI assistant needs to process over 50,000 customer inquiries every day. The AI assistant occasionally experiences traffic spikes of up to 150,000 inquiries every day during promotional events. Analysis shows that 40% of inquiries follow similar patterns that share the same context.
A GenAI developer must design a solution that will ensure low latency and consistent performance for the AI assistant during traffic spikes.
Which solution will meet these requirements MOST cost-effectively?

Accepted Answer

Correct answer: A. A. Configure latency-optimized inference by setting the latency parameter to optimized in the performance configuration of the request to Amazon Bedrock. Use prompt caching to handle the repetitive inquiries. — Option A is correct as it optimally balances performance and cost by configuring latency-optimized inference and using prompt caching for repetitive inquiries, ensuring low latency during traffic spikes. Option B is less cost-effective due to the need for provisioned throughput and model units, while option C introduces unnecessary complexity without directly addressing the latency issue. Option D, while useful for routing, may not be as efficient in handling high volumes of similar inquiries as options A.

AWS Certified Generative AI – Professional (AIP-C01) — Question 42

Answer options

Correct answer: A

Explanation