AWS Certified Data Analytics – Specialty — Question 34
An airline has .csv-formatted data stored in Amazon S3 with an AWS Glue Data Catalog. Data analysts want to join this data with call center data stored in
Amazon Redshift as part of a dally batch process. The Amazon Redshift cluster is already under a heavy load. The solution must be managed, serverless, well- functioning, and minimize the load on the existing Amazon Redshift cluster. The solution should also require minimal effort and development activity.
Which solution meets these requirements?
Answer options
- A. Unload the call center data from Amazon Redshift to Amazon S3 using an AWS Lambda function. Perform the join with AWS Glue ETL scripts.
- B. Export the call center data from Amazon Redshift using a Python shell in AWS Glue. Perform the join with AWS Glue ETL scripts.
- C. Create an external table using Amazon Redshift Spectrum for the call center data and perform the join with Amazon Redshift.
- D. Export the call center data from Amazon Redshift to Amazon EMR using Apache Sqoop. Perform the join with Apache Hive.
Correct answer: C
Explanation
Option C is the correct choice because using Amazon Redshift Spectrum allows the airline to query the data directly in S3 without loading it into Redshift, thus minimizing the load on the cluster. The other options involve additional data transfers or processing that would increase the load on Redshift or require more management overhead, which does not align with the requirements.