AWS Certified Data Analytics – Specialty — Question 1

A company has a business unit uploading .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day, duplicate records are introduced into the Amazon Redshift table.
Which solution will update the Redshift table without duplicates when jobs are rerun?

Answer options

Correct answer: A

Explanation

Option A is correct because it allows for the use of a staging table to manage data before it is written to the main table, effectively preventing duplicates. Options B and C offer alternative methods, but they involve additional complexity or may not fully prevent duplicates. Option D does not address the issue of duplicates directly and focuses on value selection rather than data integrity.