An online retail company is migrating its reporting system to AWS. The company's legacy s…

Question

An online retail company is migrating its reporting system to AWS. The company's legacy system runs data processing on online transactions using a complex series of nested Apache Hive queries. Transactional data is exported from the online system to the reporting system several times a day. Schemas in the files are stable between updates.
A data analyst wants to quickly migrate the data processing to AWS, so any code changes should be minimized. To keep storage costs low, the data analyst decides to store the data in Amazon S3. It is vital that the data from the reports and associated analytics is completely up to date based on the data in Amazon S3.
Which solution meets these requirements?

Accepted Answer

Correct answer: A. A. Create an AWS Glue Data Catalog to manage the Hive metadata. Create an AWS Glue crawler over Amazon S3 that runs when data is refreshed to ensure that data changes are updated. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR. — Option A is the correct solution because it effectively uses AWS Glue to manage metadata and ensures that data changes are captured through a crawler that runs when data is refreshed. This approach minimizes code changes while maintaining the integrity of data processing. The other options either introduce unnecessary complexity or do not ensure the data is completely up to date with the latest transactions.

AWS Certified Data Analytics – Specialty — Question 74

Answer options

Correct answer: A

Explanation