AWS Certified Data Analytics – Specialty — Question 109
A human resources company maintains a 10-node Amazon Redshift cluster to run analytics queries on the company's data. The Amazon Redshift cluster contains a product table and a transactions table, and both tables have a product_sku column. The tables are over 100 GB in size. The majority of queries run on both tables.
Which distribution style should the company use for the two tables to achieve optimal query performance?
Answer options
- A. An EVEN distribution style for both tables
- B. A KEY distribution style for both tables
- C. An ALL distribution style for the product table and an EVEN distribution style for the transactions table
- D. An EVEN distribution style for the product table and an KEY distribution style for the transactions table
Correct answer: B
Explanation
Using a KEY distribution style for both tables allows Redshift to colocate rows with the same product_sku on the same nodes, improving join performance between the tables. The EVEN distribution style, while helpful in some scenarios, does not optimize for the specific queries involving common keys, making option B the best choice. Other options either mix distribution styles or do not leverage the benefits of a key distribution effectively.