A company collects customer data every day. The company stores the data as compressed fil…

Question

A company collects customer data every day. The company stores the data as compressed files in an Amazon S3 bucket that is partitioned by date. Every month, analysts download the data, process the data to check the data quality, and then upload the data to Amazon QuickSight dashboards. An ML engineer needs to implement a solution to automatically check the data quality before the data is sent to QuickSight. Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

Correct answer: A. A. Run an AWS Glue crawler every month to update the AWS Glue Data Catalog. Use AWS Glue Data Quality rules to check the data quality. — Option A is the correct answer as it leverages AWS Glue Data Quality rules in conjunction with a monthly crawler, providing a straightforward and efficient method to ensure data quality with minimal maintenance. Options B and C introduce additional complexity with custom functions and Lambda scripts, which would increase operational overhead. Option D involves event notifications and CloudWatch insights, which is less efficient for automated data quality checks than using Glue Data Quality rules.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 146

Answer options

Correct answer: A

Explanation