A company ingests data from multiple data sources and stores the data in an Amazon S3 buc…

Question

A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to query the data that is in the data lake. The company needs to identify matching records even when the records do not have a common unique identifier. Which solution will meet this requirement?

Accepted Answer

Correct answer: D. D. Train and use the AWS Lake Formation FindMatches transform in the ETL job. — The correct answer is D because the AWS Lake Formation FindMatches transform is specifically designed to help find matching records that lack a common unique identifier. Option A, Amazon Macie, focuses on data security and privacy, not matching records. Option B, AWS Glue PySpark Filter class, is used for filtering data rather than identifying matches. Option C involves partitioning data based on a unique identifier, which does not solve the problem of matching records without such identifiers.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 121

Answer options

Correct answer: D

Explanation