A company needs to set up a data catalog and metadata management for data sources that ru…

Question

A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.
The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.
Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

Correct answer: B. B. Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and to update the Data Catalog with metadata changes. Schedule the crawlers to run periodically to update the metadata catalog. — The correct answer is B because the AWS Glue Data Catalog is specifically designed for managing metadata in a serverless manner, allowing for easy integration with various data sources through crawlers that automate the updating process. Options A and C require manual management of Lambda functions, leading to higher operational overhead. Option D, while using the Glue Data Catalog, suggests a more complex setup by manually extracting schemas, which is unnecessary when the crawlers can handle this automatically.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 41

Answer options

Correct answer: B

Explanation