AWS Certified Data Engineer – Associate (DEA-C01) — Question 172

A company receives a data file from a partner each day in an Amazon S3 bucket. The company uses a daily AWS Glue extract, transform, and load (ETL) pipeline to clean and transform each data file. The output of the ETL pipeline is written to a CSV file named Daily.csv in a second S3 bucket.

Occasionally, the daily data file is empty or is missing values for required fields. When the file is missing data, the company can use the previous day’s CSV file.

A data engineer needs to ensure that the previous day's data file is overwritten only if the new daily file is complete and valid.

Which solution will meet these requirements with the LEAST effort?

Answer options

Correct answer: B

Explanation

Option B is correct because configuring the AWS Glue ETL pipeline with Data Quality rules allows it to automatically validate the incoming data for completeness and required fields without additional manual intervention. Other options either require more complex implementations (A, C) or do not directly address the need for validation before overwriting the data (D).