A company is planning to migrate on-premises Apache Hadoop clusters to Amazon EMR. The co…

Question

A company is planning to migrate on-premises Apache Hadoop clusters to Amazon EMR. The company also needs to migrate a data catalog into a persistent storage solution.
The company currently stores the data catalog in an on-premises Apache Hive metastore on the Hadoop clusters. The company requires a serverless solution to migrate the data catalog.
Which solution will meet these requirements MOST cost-effectively?

Accepted Answer

Correct answer: B. B. Configure a Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use AWS Glue Data Catalog to store the company's data catalog as an external data catalog. — Option B is correct because it provides a direct integration of the existing Hive metastore with AWS Glue Data Catalog, which is a serverless solution for managing the data catalog. Option A involves using Amazon S3, which may not provide the same level of integration as using AWS Glue. Option C introduces Amazon Aurora MySQL, which adds unnecessary complexity and cost. Option D suggests creating a new metastore, which does not utilize the existing data structure efficiently.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 250

Answer options

Correct answer: B

Explanation