A data engineer is optimizing query performance in Amazon Athena notebooks that use Apach…

Question

A data engineer is optimizing query performance in Amazon Athena notebooks that use Apache Spark to analyze large datasets that are stored in Amazon S3. The data is partitioned. An AWS Glue crawler updates the partitions. The data engineer wants to minimize the amount of data that is scanned to improve efficiency of Athena queries. Which solution will meet these requirements?

Accepted Answer

Correct answer: A. A. Apply partition filters in the queries. — Using partition filters in the queries is the best approach as it directly reduces the amount of data scanned by only accessing relevant partitions. Increasing the frequency of AWS Glue crawler invocations or organizing data in a nested structure may improve data management but do not directly minimize the scanned data. Configuring Spark for in-memory caching can enhance performance but does not address the scanning of data in Athena queries.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 220

Answer options

Correct answer: A

Explanation