AWS Certified Data Engineer – Associate (DEA-C01) — Question 108
A marketing company uses Amazon S3 to store clickstream data. The company queries the data at the end of each day by using a SQL JOIN clause on S3 objects that are stored in separate buckets.
The company creates key performance indicators (KPIs) based on the objects. The company needs a serverless solution that will give users the ability to query data by partitioning the data. The solution must maintain the atomicity, consistency, isolation, and durability (ACID) properties of the data.
Which solution will meet these requirements MOST cost-effectively?
Answer options
- A. Amazon S3 Select
- B. Amazon Redshift Spectrum
- C. Amazon Athena
- D. Amazon EMR
Correct answer: C
Explanation
Amazon Athena is the most suitable choice as it is a serverless interactive query service that allows users to analyze data in Amazon S3 using SQL, fulfilling the need for partitioning and maintaining ACID properties. In contrast, Amazon S3 Select is limited to querying specific data from a single object, Amazon Redshift Spectrum requires a Redshift cluster, adding cost, and Amazon EMR involves managing clusters, which is not serverless.