A company receives a daily file that contains customer data in .xls format. The company s…

Question

A company receives a daily file that contains customer data in .xls format. The company stores the file in Amazon S3. The daily file is approximately 2 GB in size.
A data engineer concatenates the column in the file that contains customer first names and the column that contains customer last names. The data engineer needs to determine the number of distinct customers in the file.
Which solution will meet this requirement with the LEAST operational effort?

Accepted Answer

Correct answer: D. D. Use AWS Glue DataBrew to create a recipe that uses the COUNT_DISTINCT aggregate function to calculate the number of distinct customers. — The correct answer is D because AWS Glue DataBrew provides a user-friendly interface for data preparation tasks, including the ability to easily apply aggregate functions like COUNT_DISTINCT. Options A and C involve more complex setups with Apache Spark jobs, which require more operational effort. Option B, while feasible, also requires additional steps to set up the Glue crawler and Athena queries, making it less efficient than using DataBrew.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 56

Answer options

Correct answer: D

Explanation