AWS Certified Data Engineer – Associate (DEA-C01) — Question 22
A data engineer needs to join data from multiple sources to perform a one-time analysis job. The data is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3.
Which solution will meet this requirement MOST cost-effectively?
Answer options
- A. Use an Amazon EMR provisioned cluster to read from all sources. Use Apache Spark to join the data and perform the analysis.
- B. Copy the data from DynamoDB, Amazon RDS, and Amazon Redshift into Amazon S3. Run Amazon Athena queries directly on the S3 files.
- C. Use Amazon Athena Federated Query to join the data from all data sources.
- D. Use Redshift Spectrum to query data from DynamoDB, Amazon RDS, and Amazon S3 directly from Redshift.
Correct answer: C
Explanation
The correct answer is C because Amazon Athena Federated Query allows for querying data across multiple sources without the need for data movement, making it cost-effective for one-time analysis. Options A and D involve additional infrastructure and potential costs, while option B requires data duplication, which can lead to increased storage costs.