AWS Certified Data Engineer – Associate (DEA-C01) — Question 200
A data engineer has two datasets that contain sales information for multiple cities and states. One dataset is named reference, and the other dataset is named primary.
The data engineer needs a solution to determine whether a specific set of values in the city and state columns of the primary dataset exactly match the same specific values in the reference dataset. The data engineer wants to use Data Quality Definition Language (DQDL) rules in an AWS Glue Data Quality job.
Which rule will meet these requirements?
Answer options
- A. DatasetMatch "reference” “city->ref_city, state->ref_state” = 1.0
- B. Referentiallntegrity “city,state” “reference.{ref_city,ref_state}” = 1.0
- C. DatasetMatch “reference” “city->ref_city, state->ref_state” = 100
- D. Referentialintegrity “city,state” "reference.{ref_city,ref_state}” = 100
Correct answer: B
Explanation
The correct answer is B because the ReferentialIntegrity rule checks that the specified columns in the primary dataset match the corresponding fields in the reference dataset, ensuring data consistency. Options A and C are incorrect because the DatasetMatch rule does not validate that both the city and state values match exactly in the required context. Option D is incorrect as it uses an incorrect scoring metric for ReferentialIntegrity.