AWS Certified Machine Learning – Specialty — Question 158

A retail company wants to combine its customer orders with the product description data from its product catalog. The structure and format of the records in each dataset is different. A data analyst tried to use a spreadsheet to combine the datasets, but the effort resulted in duplicate records and records that were not properly combined. The company needs a solution that it can use to combine similar records from the two datasets and remove any duplicates.
Which solution will meet these requirements?

Answer options

Correct answer: C

Explanation

The correct answer is C because AWS Glue's FindMatches transform is specifically designed to identify and merge similar records while eliminating duplicates, which fits the company's needs perfectly. Option A lacks the sophistication required for data cleansing, while B focuses on searching rather than merging, and D does not utilize AWS Glue's tailored capabilities for this specific data cleansing task.