A company has a data processing pipeline that runs multiple SQL queries in sequence again…

Question

A company has a data processing pipeline that runs multiple SQL queries in sequence against an Amazon Redshift cluster. The company merges with a second company. The original company modifies a query that aggregates sales revenue data to join sales tables from both companies. The sales table for the first company is named Table S1. The sales table for the second company is named Table S2. Table S1 contains 10 billion records. Table S2 contains 900 million records.
The query becomes slow after the modification. A data engineer must improve the query performance.

Accepted Answer

Correct answer: B, E. B. Use the KEY distribution style for both sales tables. Select a high cardinality column to use for the join. — E. Use Amazon Redshift Advisor to review and select optimizations to implement. — Choosing option B is correct as using the KEY distribution style with a high cardinality column for the join can reduce data movement and improve performance in joins. Option E is also correct because Amazon Redshift Advisor can provide tailored recommendations for optimizing query performance. Options A, C, and D are incorrect as they either suggest low cardinality, inappropriate distribution styles, or a less effective optimization tool compared to the Advisor.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 239

Answer options

Correct answer: B, E

Explanation