AWS Certified Data Engineer – Associate (DEA-C01) — Question 64
A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3.
Which actions will provide the FASTEST queries? (Choose two.)
Answer options
- A. Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
- B. Use a columnar storage file format.
- C. Partition the data based on the most common query predicates.
- D. Split the data into files that are less than 10 KB.
- E. Use file formats that are not splittable.
Correct answer: B, C
Explanation
Using a columnar storage file format (option B) optimizes query performance by allowing Redshift Spectrum to read only the necessary columns, reducing I/O. Partitioning the data based on common query predicates (option C) enhances performance further by limiting the amount of data scanned during queries. The other options either do not significantly impact query speed or may hinder performance.