AWS Certified Data Analytics – Specialty — Question 127

A healthcare company ingests patient data from multiple data sources and stores it in an Amazon S3 staging bucket. An AWS Glue ETL job transforms the data, which is written to an S3-based data lake to be queried using Amazon Athena. The company wants to match patient records even when the records do not have a common unique identifier.
Which solution meets this requirement?

Answer options

Correct answer: D

Explanation

The correct answer is D because the AWS Glue FindMatches ML transform is specifically designed to match records that may not have a common unique identifier, utilizing machine learning to improve accuracy. The other options do not provide a suitable method for matching records without unique identifiers, as Amazon Macie focuses on data security, the PySpark filter class is for data filtering, and partitioning by patient name does not solve the matching issue.