A Solutions Architect is designing the storage layer for a data warehousing application.…

Question

A Solutions Architect is designing the storage layer for a data warehousing application. The data files are large, but they have statically placed metadata at the beginning of each file that describes the size and placement of the file's index. The data files are read in by a fleet of Amazon EC2 instances that store the index size, index location, and other category information about the data file in a database. That database is used by Amazon EMR to group files together for deeper analysis.
What would be the MOST cost-effective, high availability storage solution for this workflow?

Accepted Answer

Correct answer: A. A. Store the data files in Amazon S3 and use Range GET for each file's metadata, then index the relevant data. — Amazon S3 is the most cost-effective and highly available storage option for large files, and using Range GETs allows the EC2 instances to efficiently read only the metadata at the beginning of each file without downloading the entire payload. Solutions using Amazon EFS or Amazon EBS are significantly more expensive and introduce operational complexity when sharing data across dynamic EC2 and EMR fleets. Amazon DynamoDB is completely unsuitable for storing large files due to its 400KB item size limit and high throughput costs.

AWS Certified Solutions Architect – Professional — Question 897

Answer options

Correct answer: A

Explanation