AWS Certified Data Engineer – Associate (DEA-C01) — Question 120

A company uploads .csv files to an Amazon S3 bucket. The company’s data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.

An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.

If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.

Which solution will meet these requirements?

Answer options

Correct answer: A

Explanation

Option A is correct because it effectively uses a staging table to manage data updates, allowing for existing records to be updated without introducing duplicates. The other options either do not address the requirement for direct updates in Redshift (B), rely on a method that may not prevent duplicates (C), or do not perform updates at all (D).