AWS Certified Database – Specialty — Question 302
A retail company uses Amazon Redshift Spectrum to run complex analytical queries on objects that are stored in an Amazon S3 bucket. The objects are joined with multiple dimension tables that are stored in an Amazon Redshift database. The company uses the database to create monthly and quarterly aggregated reports. Users who attempt to run queries are reporting the following error message: error: Spectrum Scan Error: Access throttled
Which solution will resolve this error?
Answer options
- A. Check file sizes of fact tables in Amazon S3, and look for large files. Break up large files into smaller files of equal size between 100 MB and 1 GB
- B. Reduce the number of queries that users can run in parallel.
- C. Check file sizes of fact tables in Amazon S3, and look for small files. Merge the small files into larger files of at least 64 MB in size.
- D. Review and optimize queries that submit a large aggregation step to Redshift Spectrum.
Correct answer: C
Explanation
The 'Access throttled' error in Amazon Redshift Spectrum occurs when there are too many small files in Amazon S3, causing the system to exceed S3's request rate limits due to an excessive number of GET requests. Consolidating small files into larger files of at least 64 MB reduces the overall request volume and prevents S3 throttling. Splitting files further or modifying query concurrency does not resolve the underlying issue of S3 request rate exhaustion caused by small file sizes.