AWS Certified Generative AI – Professional (AIP-C01) — Question 45

A company is using Amazon Bedrock and Anthropic Claude 3 Haiku to develop an AI assistant. The AI assistant normally processes 10,000 requests each hour but experiences surges of up 30,000 requests each hour during peak usage periods. The AI assistant must respond within 2 seconds while operating across multiple AWS Regions.
The company observes that during peak usage periods, the AI assistant experiences throughput bottlenecks that cause increased latency and occasional request timeouts. The company must resolve the performance issues.
Which solution will meet this requirement?

Answer options

Correct answer: B

Explanation

Option B is correct because implementing token batching can significantly reduce API overhead and using cross-Region inference profiles allows for better traffic distribution, addressing the throughput bottlenecks. The other options either rely on insufficient scaling measures or do not effectively manage the distribution of requests across Regions, which is crucial for maintaining response times during peak usage.