AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 146

A company collects customer data every day. The company stores the data as compressed files in an Amazon S3 bucket that is partitioned by date. Every month, analysts download the data, process the data to check the data quality, and then upload the data to Amazon QuickSight dashboards.

An ML engineer needs to implement a solution to automatically check the data quality before the data is sent to QuickSight.

Which solution will meet these requirements with the LEAST operational overhead?

Answer options

Correct answer: A

Explanation

Option A is the correct answer as it leverages AWS Glue Data Quality rules in conjunction with a monthly crawler, providing a straightforward and efficient method to ensure data quality with minimal maintenance. Options B and C introduce additional complexity with custom functions and Lambda scripts, which would increase operational overhead. Option D involves event notifications and CloudWatch insights, which is less efficient for automated data quality checks than using Glue Data Quality rules.