AWS Certified Solutions Architect – Associate (SAA-C02) — Question 416
A company is using Amazon Redshift for analytics and to generate customer reports. The company recently acquired 50 TB of additional customer demographic data. The data is stored in .csv files in Amazon S3. The company needs a solution that joins the data and visualizes the results with the least possible cost and effort.
What should a solutions architect recommend to meet these requirements?
Answer options
- A. Use Amazon Redshift Spectrum to query the data in Amazon S3 directly and join that data with the existing data in Amazon Redshift. Use Amazon QuickSight to build the visualizations.
- B. Use Amazon Athena to query the data in Amazon S3. Use Amazon QuickSight to join the data from Athena with the existing data in Amazon Redshift and to build the visualizations.
- C. Increase the size of the Amazon Redshift cluster, and load the data from Amazon S3. Use Amazon EMR Notebooks to query the data and build the visualizations in Amazon Redshift.
- D. Export the data from the Amazon Redshift cluster into Apache Parquet files in Amazon S3. Use Amazon Elasticsearch Service (Amazon ES) to query the data. Use Kibana to visualize the results.
Correct answer: A
Explanation
Amazon Redshift Spectrum allows users to query data directly from Amazon S3 without loading it into the cluster, which is the most cost-effective and low-effort way to join the 50 TB of CSV files with existing Redshift data. Amazon QuickSight natively integrates with Redshift to visualize these combined datasets. Other options, such as scaling the Redshift cluster or migrating data to Amazon Elasticsearch Service, would incur substantial additional costs and administrative overhead.