AWS Certified Machine Learning – Specialty — Question 289

A data engineer is preparing a dataset that a retail company will use to predict the number of visitors to stores. The data engineer created an Amazon S3 bucket. The engineer subscribed the S3 bucket to an AWS Data Exchange data product for general economic indicators. The data engineer wants to join the economic indicator data to an existing table in Amazon Athena to merge with the business data. All these transformations must finish running in 30-60 minutes.

Which solution will meet these requirements MOST cost-effectively?

Answer options

Correct answer: C

Explanation

Using an S3 event notification to trigger an AWS Lambda function that starts an AWS Glue job is the most cost-effective, serverless method to process the incoming AWS Data Exchange data. AWS Glue is optimized for batch ETL jobs of this duration (30-60 minutes) and integrates seamlessly with Athena tables on Amazon S3. Other options, like provisioning an Amazon Redshift cluster or using Amazon SageMaker Data Wrangler, introduce unnecessary infrastructure overhead and significantly higher costs.