AWS Certified Data Engineer – Associate (DEA-C01) — Question 205
A data engineer needs to optimize the performance of a data pipeline that handles retail orders. Data about the orders is ingested daily into an Amazon S3 bucket.
The data engineer runs queries once each week to extract metrics from the orders data based the order date for multiple date ranges. The data engineer needs an optimization solution that ensures the query performance will not degrade when the volume of data increases.
Which solution will meet this requirement MOST cost-effectively?
Answer options
- A. Partition the data based on order date. Use Amazon Athena to query the data.
- B. Partition the data based on order date. Use Amazon Redshift to query the data.
- C. Partition the data based on load date. Use Amazon EMR to query the data.
- D. Partition the data based on load date. Use Amazon Aurora to query the data.
Correct answer: A
Explanation
The correct answer is A because partitioning the data by order date and using Amazon Athena provides a cost-effective solution that optimizes query performance without impacting costs significantly. Option B, while using Redshift, may incur higher costs for storage and compute. Options C and D incorrectly partition the data by load date, which does not align with the requirement for querying based on order dates.