A data engineer needs to optimize the performance of a data pipeline that handles retail…

Question

A data engineer needs to optimize the performance of a data pipeline that handles retail orders. Data about the orders is ingested daily into an Amazon S3 bucket. The data engineer runs queries once each week to extract metrics from the orders data based the order date for multiple date ranges. The data engineer needs an optimization solution that ensures the query performance will not degrade when the volume of data increases. Which solution will meet this requirement MOST cost-effectively?

Accepted Answer

Correct answer: A. A. Partition the data based on order date. Use Amazon Athena to query the data. — The correct answer is A because partitioning the data by order date and using Amazon Athena provides a cost-effective solution that optimizes query performance without impacting costs significantly. Option B, while using Redshift, may incur higher costs for storage and compute. Options C and D incorrectly partition the data by load date, which does not align with the requirement for querying based on order dates.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 205

Answer options

Correct answer: A

Explanation