AWS Certified Data Engineer – Associate (DEA-C01) — Question 206
A company has several new datasets in CSV and JSON formats. A data engineer needs to make the data available to a team of data analysts who will analyze the data by using SQL queries.
Which solution will meet these requirements in the MOST cost-effective way?
Answer options
- A. Create an Amazon RDS MySQL cluster. Use AWS Glue to transform and load the CSV and JSON files into database tables. Provide the data analysts access to the MySQL cluster.
- B. Create an AWS Glue DataBrew project that contains the new data. Make the DataBrew project available to the data analysts.
- C. Store the data in an Amazon S3 bucket. Use an AWS Glue crawler to catalog the S3 bucket as tables. Create an Amazon Athena workgroup that has a data usage threshold. Grant the data analysts access to the Athena workgroup.
- D. Load the data into Super-fast, Parallel, In-memory Calculation Engine (SPICE) in Amazon QuickSight. Allow the data analysts to create analyses and dashboards in QuickSight.
Correct answer: C
Explanation
The correct answer is C because storing data in an Amazon S3 bucket and using AWS Glue with Amazon Athena is a cost-effective way to query data without needing a dedicated database infrastructure. Option A involves higher costs due to the management of an RDS instance, while B does not provide direct SQL query capabilities, and D incurs costs related to using QuickSight and SPICE.