AWS Certified Data Analytics – Specialty — Question 82
A company is migrating from an on-premises Apache Hadoop cluster to an Amazon EMR cluster. The cluster runs only during business hours. Due to a company requirement to avoid intraday cluster failures, the EMR cluster must be highly available. When the cluster is terminated at the end of each business day, the data must persist.
Which configurations would enable the EMR cluster to meet these requirements? (Choose three.)
Answer options
- A. EMR File System (EMRFS) for storage
- B. Hadoop Distributed File System (HDFS) for storage
- C. AWS Glue Data Catalog as the metastore for Apache Hive
- D. MySQL database on the master node as the metastore for Apache Hive
- E. Multiple master nodes in a single Availability Zone
- F. Multiple master nodes in multiple Availability Zones
Correct answer: A, C, E
Explanation
The correct answer includes A, C, and E because EMR File System (EMRFS) allows for data to persist beyond the lifecycle of the cluster, and using AWS Glue Data Catalog provides a managed metastore, enhancing availability and integration. Multiple master nodes in a single Availability Zone (E) ensures high availability, whereas options B, D, and F do not fulfill the requirements as effectively as the selected choices.