A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs…

Question

A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to perform big data analysis. The company requires high reliability. A big data team must follow best practices for running cost-optimized and long-running workloads on Amazon EMR. The team must find a solution that will maintain the company's current level of performance.
Which combination of resources will meet these requirements MOST cost-effectively? (Choose two.)

Accepted Answer

Correct answer: B, D. B. Use Amazon S3 as a persistent data store. — D. Use Graviton instances for core nodes and task nodes. — Using Amazon S3 as a persistent data store (Option B) is cost-effective due to its scalability and lower storage costs compared to HDFS. Graviton instances (Option D) provide better price-performance compared to x86-based instances, making them ideal for cost-optimized workloads. The other options, such as HDFS and x86-based instances, are not as efficient in terms of cost and reliability for this scenario.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 23

Answer options

Correct answer: B, D

Explanation