AWS Certified Machine Learning – Specialty — Question 116

A technology startup is using complex deep neural networks and GPU compute to recommend the company's products to its existing customers based upon each customer's habits and interactions. The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a TensorFlow model pulled from the company's Git repository that runs locally. This job then runs for several hours while continually outputting its progress to the same S3 bucket. The job can be paused, restarted, and continued at any time in the event of a failure, and is run from a central queue.
Senior managers are concerned about the complexity of the solution's resource management and the costs involved in repeating the process regularly. They ask for the workload to be automated so it runs once a week, starting Monday and completing by the close of business Friday.
Which architecture should be used to scale the solution at the lowest cost?

Answer options

Correct answer: A

Explanation

Option A is the best choice as it leverages AWS Deep Learning Containers and AWS Batch on Spot Instances, which can significantly reduce costs while efficiently managing resources. Option B, while cost-effective, may not provide the same level of automation and resource management as AWS Batch. Option C, although using AWS Fargate, may introduce additional costs and complexity compared to the simplicity of Batch. Option D relies on Amazon ECS, which may not be as optimal as Batch for this particular use case.