AWS Certified Data Analytics – Specialty — Question 153
A marketing company collects data from third-party providers and uses transient Amazon EMR clusters to process this data. The company wants to host an
Apache Hive metastore that is persistent, reliable, and can be accessed by EMR clusters and multiple AWS services and accounts simultaneously. The metastore must also be available at all times.
Which solution meets these requirements with the LEAST operational overhead?
Answer options
- A. Use AWS Glue Data Catalog as the metastore
- B. Use an external Amazon EC2 instance running MySQL as the metastore
- C. Use Amazon RDS for MySQL as the metastore
- D. Use Amazon S3 as the metastore
Correct answer: A
Explanation
The correct answer is A, as AWS Glue Data Catalog is a fully managed service that provides a persistent, reliable metastore with minimal operational overhead. Options B and C involve managing external databases, which adds complexity and operational effort, while D is not suitable for a metastore as Amazon S3 is primarily for storage, not for metadata management.