AWS Certified Data Engineer – Associate (DEA-C01) — Question 60

A company extracts approximately 1 TB of data every day from data sources such as SAP HANA, Microsoft SQL Server, MongoDB, Apache Kafka, and Amazon DynamoDB. Some of the data sources have undefined data schemas or data schemas that change.
A data engineer must implement a solution that can detect the schema for these data sources. The solution must extract, transform, and load the data to an Amazon S3 bucket. The company has a service level agreement (SLA) to load the data into the S3 bucket within 15 minutes of data creation.
Which solution will meet these requirements with the LEAST operational overhead?

Answer options

Correct answer: B

Explanation

The correct answer, B, utilizes AWS Glue, which is specifically designed for data preparation and ETL processes, making it well-suited for schema detection and operational efficiency. Options A and D introduce more complexity and operational overhead by involving Amazon EMR and Redshift, respectively, which are not as streamlined for this type of data pipeline. Option C, while feasible, does not leverage the built-in schema detection and transformation capabilities of AWS Glue, leading to potentially higher maintenance and operational tasks.