An investment company needs to manage and extract insights from a volume of semi-structur…

Question

An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously. A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates. Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

Correct answer: A. A. Use the FindMatches feature of AWS Glue to remove duplicate records. — The correct answer is A because the FindMatches feature of AWS Glue is specifically designed to identify and remove duplicate records with minimal operational complexity. Options B and C involve more complex operations that do not directly address the deduplication needs as effectively as AWS Glue. Option D focuses on preventing duplicates rather than removing existing ones.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 254

Answer options

Correct answer: A

Explanation