A healthcare company ingests patient data from multiple data sources and stores it in an…

Question

A healthcare company ingests patient data from multiple data sources and stores it in an Amazon S3 staging bucket. An AWS Glue ETL job transforms the data, which is written to an S3-based data lake to be queried using Amazon Athena. The company wants to match patient records even when the records do not have a common unique identifier.
Which solution meets this requirement?

Accepted Answer

Correct answer: D. D. Train and use the AWS Glue FindMatches ML transform in the ETLjob — The correct answer is D because the AWS Glue FindMatches ML transform is specifically designed to match records that may not have a common unique identifier, utilizing machine learning to improve accuracy. The other options do not provide a suitable method for matching records without unique identifiers, as Amazon Macie focuses on data security, the PySpark filter class is for data filtering, and partitioning by patient name does not solve the matching issue.

AWS Certified Data Analytics – Specialty — Question 127

Answer options

Correct answer: D

Explanation