AWS Certified Generative AI – Professional (AIP-C01) — Question 65

A company is designing an API for a generative AI (GenAI) application that uses a foundation model (FM) that is hosted on a managed model service. The API must stream responses to reduce latency, enforce token limits to manage compute resource usage, and implement retry logic to handle model timeouts and partial responses.
Which solution will meet these requirements with the LEAST operational overhead?

Answer options

Correct answer: D

Explanation

Option D is the most efficient choice as it directly integrates AWS Lambda with Amazon Bedrock, allowing for effective streaming of responses while managing token limits and implementing retry logic with minimal operational overhead. Other options either add unnecessary complexity, like client-side polling in option B, or rely on more involved architectures such as ECS in option C, which can increase operational burdens.