AWS Certified Data Engineer – Associate (DEA-C01) — Question 82
A retail company uses Amazon Aurora PostgreSQL to process and store live transactional data. The company uses an Amazon Redshift cluster for a data warehouse.
An extract, transform, and load (ETL) job runs every morning to update the Redshift cluster with new data from the PostgreSQL database. The company has grown rapidly and needs to cost optimize the Redshift cluster.
A data engineer needs to create a solution to archive historical data. The data engineer must be able to run analytics queries that effectively combine data from live transactional data in PostgreSQL, current data in Redshift, and archived historical data. The solution must keep only the most recent 15 months of data in Amazon Redshift to reduce costs.
Which combination of steps will meet these requirements? (Choose two.)
Answer options
- A. Configure the Amazon Redshift Federated Query feature to query live transactional data that is in the PostgreSQL database.
- B. Configure Amazon Redshift Spectrum to query live transactional data that is in the PostgreSQL database.
- C. Schedule a monthly job to copy data that is older than 15 months to Amazon S3 by using the UNLOAD command. Delete the old data from the Redshift cluster. Configure Amazon Redshift Spectrum to access historical data in Amazon S3.
- D. Schedule a monthly job to copy data that is older than 15 months to Amazon S3 Glacier Flexible Retrieval by using the UNLOAD command. Delete the old data from the Redshift cluster. Configure Redshift Spectrum to access historical data from S3 Glacier Flexible Retrieval.
- E. Create a materialized view in Amazon Redshift that combines live, current, and historical data from different sources.
Correct answer: A
Explanation
The correct answer is A because using Amazon Redshift Federated Query allows the data engineer to access live data directly from PostgreSQL without needing to store it in Redshift, which is crucial for cost optimization. Options B, C, D, and E do not provide the same level of direct integration with live transactional data or involve unnecessary data movement, which contradicts the requirement to minimize costs.