AWS Certified Solutions Architect – Professional — Question 897

A Solutions Architect is designing the storage layer for a data warehousing application. The data files are large, but they have statically placed metadata at the beginning of each file that describes the size and placement of the file's index. The data files are read in by a fleet of Amazon EC2 instances that store the index size, index location, and other category information about the data file in a database. That database is used by Amazon EMR to group files together for deeper analysis.
What would be the MOST cost-effective, high availability storage solution for this workflow?

Answer options

Correct answer: A

Explanation

Amazon S3 is the most cost-effective and highly available storage option for large files, and using Range GETs allows the EC2 instances to efficiently read only the metadata at the beginning of each file without downloading the entire payload. Solutions using Amazon EFS or Amazon EBS are significantly more expensive and introduce operational complexity when sharing data across dynamic EC2 and EMR fleets. Amazon DynamoDB is completely unsuitable for storing large files due to its 400KB item size limit and high throughput costs.