AWS Certified Data Engineer – Associate (DEA-C01) — Question 50
A financial company wants to use Amazon Athena to run on-demand SQL queries on a petabyte-scale dataset to support a business intelligence (BI) application. An AWS Glue job that runs during non-business hours updates the dataset once every day. The BI application has a standard data refresh frequency of 1 hour to comply with company policies.
A data engineer wants to cost optimize the company's use of Amazon Athena without adding any additional infrastructure costs.
Which solution will meet these requirements with the LEAST operational overhead?
Answer options
- A. Configure an Amazon S3 Lifecycle policy to move data to the S3 Glacier Deep Archive storage class after 1 day.
- B. Use the query result reuse feature of Amazon Athena for the SQL queries.
- C. Add an Amazon ElastiCache cluster between the BI application and Athena.
- D. Change the format of the files that are in the dataset to Apache Parquet.
Correct answer: B
Explanation
The correct answer is B because using the query result reuse feature allows Athena to avoid reprocessing previously executed queries, thus reducing costs without requiring any additional infrastructure. Option A is not optimal since moving data to S3 Glacier Deep Archive would not support the frequency of access needed for BI reporting. Option C introduces additional infrastructure costs, which contradicts the goal of cost optimization. Option D could improve performance but does not address the immediate cost concerns as directly as option B.