AWS Certified Database – Specialty — Question 339
A company is using Amazon Redshift as its data warehouse solution. The Redshift cluster handles the following types of workloads:
✑ Real-time inserts through Amazon Kinesis Data Firehose
✑ Bulk inserts through COPY commands from Amazon S3
✑ Analytics through SQL queries
Recently, the cluster has started to experience performance issues.
Which combination of actions should a database specialist take to improve the cluster's performance? (Choose three.)
Answer options
- A. Modify the Kinesis Data Firehose delivery stream to stream the data to Amazon S3 with a high buffer size and to load the data into Amazon Redshift by using the COPY command.
- B. Stream real-time data into Redshift temporary tables before loading the data into permanent tables.
- C. For bulk inserts, split input files on Amazon S3 into multiple files to match the number of slices on Amazon Redshift. Then use the COPY command to load data into Amazon Redshift.
- D. For bulk inserts, use the parallel parameter in the COPY command to enable multi-threading.
- E. Optimize analytics SQL queries to use sort keys.
- F. Avoid using temporary tables in analytics SQL queries.
Correct answer: B, C, E
Explanation
Splitting S3 input files to match the number of Redshift slices allows the COPY command to load data in parallel across all slices, maximizing throughput. Ingesting real-time data into temporary staging tables first reduces lock contention and fragmentation on the primary target tables. Additionally, designing analytical SQL queries to filter and join on designated sort keys drastically minimizes disk I/O and speeds up query execution.