AWS Certified Generative AI – Professional (AIP-C01) — Question 42
An ecommerce company is using Amazon Bedrock to build a customer service AI assistant. The AI assistant needs to process over 50,000 customer inquiries every day. The AI assistant occasionally experiences traffic spikes of up to 150,000 inquiries every day during promotional events. Analysis shows that 40% of inquiries follow similar patterns that share the same context.
A GenAI developer must design a solution that will ensure low latency and consistent performance for the AI assistant during traffic spikes.
Which solution will meet these requirements MOST cost-effectively?
Answer options
- A. Configure latency-optimized inference by setting the latency parameter to optimized in the performance configuration of the request to Amazon Bedrock. Use prompt caching to handle the repetitive inquiries.
- B. Purchase provisioned throughput and model units (MUs) that are sized to handle peak traffic loads. Use Amazon ElastiCache (Redis OSS) to cache repetitive inquiries.
- C. Use Amazon Bedrock Agents and custom knowledge bases to pre-process customer inquiries. Configure cross-Region inference to distribute traffic.
- D. Use AWS Lambda functions to pre-process requests by using a custom prompt routing mechanism. Use Amazon DynamoDB as a caching layer to handle frequently asked questions.
Correct answer: A
Explanation
Option A is correct as it optimally balances performance and cost by configuring latency-optimized inference and using prompt caching for repetitive inquiries, ensuring low latency during traffic spikes. Option B is less cost-effective due to the need for provisioned throughput and model units, while option C introduces unnecessary complexity without directly addressing the latency issue. Option D, while useful for routing, may not be as efficient in handling high volumes of similar inquiries as options A.