AWS Certified Data Engineer – Associate (DEA-C01) — Question 168

A company uses AWS Glue Data Catalog to index data that is uploaded to an Amazon S3 bucket every day. The company uses a daily batch processes in an extract, transform, and load (ETL) pipeline to upload data from external sources into the S3 bucket.

The company runs a daily report on the S3 data. Some days, the company runs the report before all the daily data has been uploaded to the S3 bucket. A data engineer must be able to send a message that identifies any incomplete data to an existing Amazon Simple Notification Service (Amazon SNS) topic.

Which solution will meet this requirement with the LEAST operational overhead?

Answer options

Correct answer: C

Explanation

The correct answer, C, is optimal as it uses AWS Glue workflows to automate data quality checks and integrates seamlessly with Amazon EventBridge to notify the data engineer about any incomplete datasets, minimizing operational overhead. Options A and B involve setting up additional infrastructure (Apache Airflow and Amazon EMR, respectively), which increases complexity and management efforts. Option D uses AWS Lambda and Step Functions, but it still requires additional management compared to the integrated AWS Glue solution.