AWS Certified Data Engineer – Associate (DEA-C01) — Question 23
A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to perform big data analysis. The company requires high reliability. A big data team must follow best practices for running cost-optimized and long-running workloads on Amazon EMR. The team must find a solution that will maintain the company's current level of performance.
Which combination of resources will meet these requirements MOST cost-effectively? (Choose two.)
Answer options
- A. Use Hadoop Distributed File System (HDFS) as a persistent data store.
- B. Use Amazon S3 as a persistent data store.
- C. Use x86-based instances for core nodes and task nodes.
- D. Use Graviton instances for core nodes and task nodes.
- E. Use Spot Instances for all primary nodes.
Correct answer: B, D
Explanation
Using Amazon S3 as a persistent data store (Option B) is cost-effective due to its scalability and lower storage costs compared to HDFS. Graviton instances (Option D) provide better price-performance compared to x86-based instances, making them ideal for cost-optimized workloads. The other options, such as HDFS and x86-based instances, are not as efficient in terms of cost and reliability for this scenario.