AWS Certified Solutions Architect – Professional (SAP-C02) — Question 447
A company needs to run large batch-processing jobs on data that is stored in an Amazon S3 bucket. The jobs perform simulations. The results of the jobs are not time sensitive, and the process can withstand interruptions.
Each job must process 15-20 GB of data when the data is stored in the S3 bucket. The company will store the output from the jobs in a different Amazon S3 bucket for further analysis.
Which solution will meet these requirements MOST cost-effectively?
Answer options
- A. Create a serverless data pipeline. Use AWS Step Functions for orchestration. Use AWS Lambda functions with provisioned capacity to process the data.
- B. Create an AWS Batch compute environment that includes Amazon EC2 Spot Instances. Specify the SPOT_CAPACITY_OPTIMIZED allocation strategy.
- C. Create an AWS Batch compute environment that includes Amazon EC2 On-Demand Instances and Spot Instances. Specify the SPOT_CAPACITY_OPTIMIZED allocation strategy for the Spot Instances.
- D. Use Amazon Elastic Kubernetes Service (Amazon EKS) to run the processing jobs. Use managed node groups that contain a combination of Amazon EC2 On-Demand Instances and Spot Instances.
Correct answer: B
Explanation
AWS Batch is purpose-built for running batch computing workloads, and using 100% Amazon EC2 Spot Instances provides the most cost-effective compute capacity for workloads that are interruptible and not time-sensitive. Incorporating On-Demand Instances (as in options C and D) increases costs unnecessarily since the jobs can tolerate interruptions. AWS Lambda (option A) is not suitable for processing 15-20 GB of data due to its temporary storage and execution time limits, and provisioned capacity would further increase costs.