AWS Certified Data Engineer – Associate (DEA-C01) — Question 86
An online retail company stores Application Load Balancer (ALB) access logs in an Amazon S3 bucket. The company wants to use Amazon Athena to query the logs to analyze traffic patterns.
A data engineer creates an unpartitioned table in Athena. As the amount of the data gradually increases, the response time for queries also increases. The data engineer wants to improve the query performance in Athena.
Which solution will meet these requirements with the LEAST operational effort?
Answer options
- A. Create an AWS Glue job that determines the schema of all ALB access logs and writes the partition metadata to AWS Glue Data Catalog.
- B. Create an AWS Glue crawler that includes a classifier that determines the schema of all ALB access logs and writes the partition metadata to AWS Glue Data Catalog.
- C. Create an AWS Lambda function to transform all ALB access logs. Save the results to Amazon S3 in Apache Parquet format. Partition the metadata. Use Athena to query the transformed data.
- D. Use Apache Hive to create bucketed tables. Use an AWS Lambda function to transform all ALB access logs.
Correct answer: B
Explanation
Option B is correct because using an AWS Glue crawler simplifies the process of discovering the schema and automatically updating the Data Catalog with partition metadata, requiring minimal manual intervention. Options A and D involve more complex setups and operational overhead, while option C, although effective, requires additional steps to transform and save the data, which increases operational effort.