AWS Certified Data Engineer – Associate (DEA-C01) — Question 95
A retail company stores transactions, store locations, and customer information tables in four reserved ra3.4xlarge Amazon Redshift cluster nodes. All three tables use even table distribution.
The company updates the store location table only once or twice every few years.
A data engineer notices that Redshift queues are slowing down because the whole store location table is constantly being broadcast to all four compute nodes for most queries. The data engineer wants to speed up the query performance by minimizing the broadcasting of the store location table.
Which solution will meet these requirements in the MOST cost-effective way?
Answer options
- A. Change the distribution style of the store location table from EVEN distribution to ALL distribution.
- B. Change the distribution style of the store location table to KEY distribution based on the column that has the highest dimension.
- C. Add a join column named store_id into the sort key for all the tables.
- D. Upgrade the Redshift reserved node to a larger instance size in the same instance family.
Correct answer: A
Explanation
Option A is correct because changing to ALL distribution will ensure that the store location table is available on each node, eliminating the need for broadcasting and improving query performance. The other options either do not address the broadcasting issue effectively or involve unnecessary costs, such as upgrading instance sizes or modifying keys without solving the root problem.