AWS Certified Solutions Architect – Professional — Question 437
Your company is storing millions of sensitive transactions across thousands of 100-GB files that must be encrypted in transit and at rest. Analysts concurrently depend on subsets of files, which can consume up to 5 TB of space, to generate simulations that can be used to steer business decisions.
You are required to design an AWS solution that can cost effectively accommodate the long-term storage and in-flight subsets of data.
Which approach can satisfy these objectives?
Answer options
- A. Use Amazon Simple Storage Service (S3) with server-side encryption, and run simulations on subsets in ephemeral drives on Amazon EC2.
- B. Use Amazon S3 with server-side encryption, and run simulations on subsets in-memory on Amazon EC2.
- C. Use HDFS on Amazon EMR, and run simulations on subsets in ephemeral drives on Amazon EC2.
- D. Use HDFS on Amazon Elastic MapReduce (EMR), and run simulations on subsets in-memory on Amazon Elastic Compute Cloud (EC2).
- E. Store the full data set in encrypted Amazon Elastic Block Store (EBS) volumes, and regularly capture snapshots that can be cloned to EC2 workstations.
Correct answer: A
Explanation
Amazon S3 provides the most cost-effective and durable long-term storage for large-scale datasets, and it natively supports server-side encryption. For processing 5 TB subsets, using EC2 ephemeral drives (instance store) is highly cost-effective because it avoids the premium cost of high-memory instances required for in-memory processing. In contrast, maintaining an active Amazon EMR cluster for HDFS or using provisioned EBS volumes for the entire dataset would be significantly more expensive.