AWS Certified Data Engineer – Associate (DEA-C01) — Question 254

An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously.

A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates.

Which solution will meet these requirements with the LEAST operational overhead?

Answer options

Correct answer: A

Explanation

The correct answer is A because the FindMatches feature of AWS Glue is specifically designed to identify and remove duplicate records with minimal operational complexity. Options B and C involve more complex operations that do not directly address the deduplication needs as effectively as AWS Glue. Option D focuses on preventing duplicates rather than removing existing ones.