AWS Certified Data Analytics – Specialty — Question 33
A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following tables.
✑ A trips fact table for information on completed rides.
✑ A drivers dimension table for driver profiles.
✑ A customers fact table holding customer profile information.
The company analyzes trip details by date and destination to examine profitability by region. The drivers data rarely changes. The customers data frequently changes.
What table design provides optimal query performance?
Answer options
- A. Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers and customers tables.
- B. Use DISTSTYLE EVEN for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
- C. Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
- D. Use DISTSTYLE EVEN for the drivers table and sort by date. Use DISTSTYLE ALL for both fact tables.
Correct answer: C
Explanation
Option C is the best choice because using DISTSTYLE KEY on the trips table with destination as the key allows for optimized joins and queries related to profitability. Sorting by date enhances performance for time-based analyses. Using DISTSTYLE ALL for the drivers table is effective since their data is static, while using DISTSTYLE EVEN for the frequently changing customers table balances data distribution, but does not optimize for query performance as effectively as option C.