A company has a business unit uploading .csv files to an Amazon S3 bucket. The company's…

Question

A company has a business unit uploading .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day, duplicate records are introduced into the Amazon Redshift table.
Which solution will update the Redshift table without duplicates when jobs are rerun?

Accepted Answer

Correct answer: A. A. Modify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class. — Option A is correct because it allows for the use of a staging table to manage data before it is written to the main table, effectively preventing duplicates. Options B and C offer alternative methods, but they involve additional complexity or may not fully prevent duplicates. Option D does not address the issue of duplicates directly and focuses on value selection rather than data integrity.

AWS Certified Data Analytics – Specialty — Question 1

Answer options

Correct answer: A

Explanation