AWS Certified Data Engineer – Associate (DEA-C01) — Question 239
A company has a data processing pipeline that runs multiple SQL queries in sequence against an Amazon Redshift cluster. The company merges with a second company. The original company modifies a query that aggregates sales revenue data to join sales tables from both companies. The sales table for the first company is named Table S1. The sales table for the second company is named Table S2. Table S1 contains 10 billion records. Table S2 contains 900 million records.
The query becomes slow after the modification. A data engineer must improve the query performance.
Which solutions will meet these requirements? (Choose two.)
Answer options
- A. Use the KEY distribution style for both sales tables. Select a low cardinality column to use for the join.
- B. Use the KEY distribution style for both sales tables. Select a high cardinality column to use for the join.
- C. Use the EVEN distribution style for Table S1. Use the ALL distribution style for Table S2.
- D. Use the Amazon Redshift query optimizer to review and select optimizations to implement.
- E. Use Amazon Redshift Advisor to review and select optimizations to implement.
Correct answer: B, E
Explanation
Choosing option B is correct as using the KEY distribution style with a high cardinality column for the join can reduce data movement and improve performance in joins. Option E is also correct because Amazon Redshift Advisor can provide tailored recommendations for optimizing query performance. Options A, C, and D are incorrect as they either suggest low cardinality, inappropriate distribution styles, or a less effective optimization tool compared to the Advisor.