A company is designing an API for a generative AI (GenAI) application that uses a foundat…

Question

A company is designing an API for a generative AI (GenAI) application that uses a foundation model (FM) that is hosted on a managed model service. The API must stream responses to reduce latency, enforce token limits to manage compute resource usage, and implement retry logic to handle model timeouts and partial responses.
Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

Correct answer: D. D. Integrate an Amazon API Gateway REST API with an AWS Lambda function that invokes Amazon Bedrock. Use Lambda response streaming to stream responses. Enforce token limits within the Lambda function. Implement retry logic by using Lambda and API Gateway timeout configurations. — Option D is the most efficient choice as it directly integrates AWS Lambda with Amazon Bedrock, allowing for effective streaming of responses while managing token limits and implementing retry logic with minimal operational overhead. Other options either add unnecessary complexity, like client-side polling in option B, or rely on more involved architectures such as ECS in option C, which can increase operational burdens.

AWS Certified Generative AI – Professional (AIP-C01) — Question 65

Answer options

Correct answer: D

Explanation