A company is migrating a legacy application to an Amazon S3 based data lake. A data engin…

Question

A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information.
The data engineer must identify and remove duplicate information from the legacy application data.
Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

Correct answer: B. B. Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication. — The correct answer is B because AWS Glue's FindMatches machine learning transform is specifically designed for deduplication tasks and minimizes operational overhead by automating the process. Option A requires manual coding, which increases complexity, while options C and D also require custom implementations that do not leverage the automated capabilities of AWS Glue.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 72

Answer options

Correct answer: B

Explanation