AWS Certified Big Data – Specialty — Question 21
A company has several teams of analysts. Each team of analysts has their own cluster. The teams need to run
SQL queries using Hive, Spark-SQL, and Presto with Amazon EMR. The company needs to enable a centralized metadata layer to expose the Amazon S3 objects as tables to the analysts.
Which approach meets the requirement for a centralized metadata layer?
Answer options
- A. EMRFS consistent view with a common Amazon DynamoDB table
- B. Bootstrap action to change the Hive Metastore to an Amazon RDS database
- C. s3distcp with the outputManifest option to generate RDS DDL
- D. Naming scheme support with automatic partition discovery from Amazon S3
Correct answer: A
Explanation
The correct answer is A because using an EMRFS consistent view with a common Amazon DynamoDB table allows for a centralized metadata management solution that integrates with the different analytics tools required. Options B and C do not provide a centralized solution for metadata management across multiple teams, and option D pertains to a naming scheme that does not establish a unified metadata layer.