A data engineer is using an AWS Glue crawler to catalog data that is in an Amazon S3 buck…

Question

A data engineer is using an AWS Glue crawler to catalog data that is in an Amazon S3 bucket. The S3 bucket contains both .csv and json files. The data engineer configured the crawler to exclude the .json files from the catalog. When the data engineer runs queries in Amazon Athena, the queries also process the excluded .json files. The data engineer wants to resolve this issue. The data engineer needs a solution that will not affect access requirements for the .csv files in the source S3 bucket. Which solution will meet this requirement with the SHORTEST query times?

Accepted Answer

Correct answer: C. C. Relocate the .json files to a different path within the S3 bucket. — Moving the .json files to a different path within the S3 bucket ensures they are entirely separate from the .csv files and will not be processed by Athena, thus resulting in the shortest query times. Adjusting the AWS Glue crawler settings or using the Athena console to exclude the files does not physically remove them from the query path, and S3 bucket policies would not affect the query processing directly.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 129

Answer options

Correct answer: C

Explanation