AWS Certified Solutions Architect – Associate (SAA-C03) — Question 418

A company is developing a new machine learning (ML) model solution on AWS. The models are developed as independent microservices that fetch approximately 1 GB of model data from Amazon S3 at startup and load the data into memory. Users access the models through an asynchronous API. Users can send a request or a batch of requests and specify where the results should be sent.

The company provides models to hundreds of users. The usage patterns for the models are irregular. Some models could be unused for days or weeks. Other models could receive batches of thousands of requests at a time.

Which design should a solutions architect recommend to meet these requirements?

Answer options

Correct answer: D

Explanation

Amazon SQS is ideal for decoupling the asynchronous API requests and buffering bursts of incoming messages. Amazon ECS is the correct compute choice because downloading and loading a 1 GB model into memory on AWS Lambda would cause severe cold-start latency, and AWS Auto Scaling cannot be used to scale vCPUs on Lambda functions. By using AWS Auto Scaling with ECS, the system can scale both the task count and the underlying cluster capacity to zero during idle periods, and scale up dynamically based on the SQS queue size during traffic spikes.