AWS Certified Solutions Architect – Professional — Question 907

A company has a large on-premises Apache Hadoop cluster with a 20 PB HDFS database. The cluster is growing every quarter by roughly 200 instances and 1
PB. The company's goals are to enable resiliency for its Hadoop data, limit the impact of losing cluster nodes, and significantly reduce costs. The current cluster runs 24/7 and supports a variety of analysis workloads, including interactive queries and batch processing.
Which solution would meet these requirements with the LEAST expense and down time?

Answer options

Correct answer: A

Explanation

AWS Snowmobile is the most appropriate service for transferring a 20 PB dataset, as using AWS Snowball would require a massive fleet of devices and Direct Connect would take too long and cost too much. For the architecture, using a combination of a persistent EMR cluster with Spot Instances for interactive queries and transient, job-specific EMR clusters for batch workloads optimizes costs compared to running a single, massive persistent cluster.